All of lore.kernel.org
 help / color / mirror / Atom feed
* Caught the first erroneous translated errorcode
@ 2017-06-17 10:50 Willem Jan Withagen
  2017-06-17 17:52 ` John Spray
  0 siblings, 1 reply; 17+ messages in thread
From: Willem Jan Withagen @ 2017-06-17 10:50 UTC (permalink / raw)
  To: Ceph Development

Hi,

I think I've found the first fact where the errno translation (ceph ->
hostos -> client-ceph ) goes wrong....

Repeatedly I get the following error:
116:
/home/jenkins/workspace/ceph-master/src/test/libradosstriper/rados-striper.sh:42:
run:  rados --pool rbd --striper put toy
file td/rados-striper/toyfile
116: 2017-06-17 12:32:05.290234 810016000 -1 libradosstriper:
RadosStriperImpl::openStripedObjectForWrite : could not set new s
ize for toyfile : rc = -125error putting rbd/toyfile: (125) Unknown
error: 125

125 is ECANCELD on Linux
but FreeBSD
#define ECANCELED       85              /* Operation canceled */

So probably the server returns ECANCELD in network format (125)
but the client does not translate back...

Now if I want to have some logging for this in Freebsd_errno.cc
What is the easiest way to get something in the logs?
What descriptor is available there?

--WjW

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-17 10:50 Caught the first erroneous translated errorcode Willem Jan Withagen
@ 2017-06-17 17:52 ` John Spray
  2017-06-17 20:59   ` Willem Jan Withagen
  2017-06-19 17:29   ` Adam C. Emerson
  0 siblings, 2 replies; 17+ messages in thread
From: John Spray @ 2017-06-17 17:52 UTC (permalink / raw)
  To: Willem Jan Withagen; +Cc: Ceph Development

On Sat, Jun 17, 2017 at 11:50 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> Hi,
>
> I think I've found the first fact where the errno translation (ceph ->
> hostos -> client-ceph ) goes wrong....
>
> Repeatedly I get the following error:
> 116:
> /home/jenkins/workspace/ceph-master/src/test/libradosstriper/rados-striper.sh:42:
> run:  rados --pool rbd --striper put toy
> file td/rados-striper/toyfile
> 116: 2017-06-17 12:32:05.290234 810016000 -1 libradosstriper:
> RadosStriperImpl::openStripedObjectForWrite : could not set new s
> ize for toyfile : rc = -125error putting rbd/toyfile: (125) Unknown
> error: 125
>
> 125 is ECANCELD on Linux
> but FreeBSD
> #define ECANCELED       85              /* Operation canceled */
>
> So probably the server returns ECANCELD in network format (125)
> but the client does not translate back...

Somewhat related perhaps: people running cephfs on ARM recently had
this problem, for that case the solution was simply to define in Ceph
some constants that mirror the linux ones, see commit 88d2da5e9.

John

> Now if I want to have some logging for this in Freebsd_errno.cc
> What is the easiest way to get something in the logs?
> What descriptor is available there?
>
> --WjW
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-17 17:52 ` John Spray
@ 2017-06-17 20:59   ` Willem Jan Withagen
  2017-06-18 17:18     ` Willem Jan Withagen
  2017-06-19 17:29   ` Adam C. Emerson
  1 sibling, 1 reply; 17+ messages in thread
From: Willem Jan Withagen @ 2017-06-17 20:59 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

On 17-6-2017 19:52, John Spray wrote:
> On Sat, Jun 17, 2017 at 11:50 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> Hi,
>>
>> I think I've found the first fact where the errno translation (ceph ->
>> hostos -> client-ceph ) goes wrong....
>>
>> Repeatedly I get the following error:
>> 116:
>> /home/jenkins/workspace/ceph-master/src/test/libradosstriper/rados-striper.sh:42:
>> run:  rados --pool rbd --striper put toy
>> file td/rados-striper/toyfile
>> 116: 2017-06-17 12:32:05.290234 810016000 -1 libradosstriper:
>> RadosStriperImpl::openStripedObjectForWrite : could not set new s
>> ize for toyfile : rc = -125error putting rbd/toyfile: (125) Unknown
>> error: 125
>>
>> 125 is ECANCELD on Linux
>> but FreeBSD
>> #define ECANCELED       85              /* Operation canceled */
>>
>> So probably the server returns ECANCELD in network format (125)
>> but the client does not translate back...
> 
> Somewhat related perhaps: people running cephfs on ARM recently had
> this problem, for that case the solution was simply to define in Ceph
> some constants that mirror the linux ones, see commit 88d2da5e9.

Hi John,

I think is at the other end I'm working at.
This commit is about the flags being issued to file open.
Where I'm sort of suprised since this is Linux <> Linux.
So perhaps it is all about big-endian <> Little Endian.

My PR is more about the error code that differ between Linux and
FreeBSD. So A FreeBSD client would not (correctly) understand the error
codes that a Linux server issues.
So I translate all wire error codes to Linux codes on a server
(hostos_to_ceph_errno_conv()), and in a FreeBSD client I translate the
wire-error codes into FreeBSD codes. (ceph_to_hostoserrno_conv())

Specific in this case:
ECANCELED is 125 on Linux, and is the on wire code.
So the servers started in the test will signal ECANCELED with value 125,
but because the rados-stripe code does not translate that back into 85
(FreeBSD ECANCELED) it is reported as a unkonwn error. Whereas in this
part of the code ECANCELED is a valid return and is adequately handled.

So that is why I suspect that the rados code does not take this
translation into account. Which is not supprising, since only the code
from OS to wire was available, but the back path was not included until
I introduced it. But it is hard to find all locations where it should be
applied.

So I'm looking for (all) the correct place to insert:
	ceph_to_hostos_errno_conv(err)

--WjW



>> Now if I want to have some logging for this in Freebsd_errno.cc
>> What is the easiest way to get something in the logs?
>> What descriptor is available there?
>>
>> --WjW
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-17 20:59   ` Willem Jan Withagen
@ 2017-06-18 17:18     ` Willem Jan Withagen
  2017-06-19 12:56       ` Jason Dillaman
  0 siblings, 1 reply; 17+ messages in thread
From: Willem Jan Withagen @ 2017-06-18 17:18 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development, Sage Weil

On 17-6-2017 22:59, Willem Jan Withagen wrote:
> On 17-6-2017 19:52, John Spray wrote:
>> On Sat, Jun 17, 2017 at 11:50 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>> Hi,
>>>
>>> I think I've found the first fact where the errno translation (ceph ->
>>> hostos -> client-ceph ) goes wrong....
>>>
>>> Repeatedly I get the following error:
>>> 116:
>>> /home/jenkins/workspace/ceph-master/src/test/libradosstriper/rados-striper.sh:42:
>>> run:  rados --pool rbd --striper put toy
>>> file td/rados-striper/toyfile
>>> 116: 2017-06-17 12:32:05.290234 810016000 -1 libradosstriper:
>>> RadosStriperImpl::openStripedObjectForWrite : could not set new s
>>> ize for toyfile : rc = -125error putting rbd/toyfile: (125) Unknown
>>> error: 125
>>>
>>> 125 is ECANCELD on Linux
>>> but FreeBSD
>>> #define ECANCELED       85              /* Operation canceled */
>>>
>>> So probably the server returns ECANCELD in network format (125)
>>> but the client does not translate back...
>>
>> Somewhat related perhaps: people running cephfs on ARM recently had
>> this problem, for that case the solution was simply to define in Ceph
>> some constants that mirror the linux ones, see commit 88d2da5e9.
> 
> Hi John,
> 
> I think is at the other end I'm working at.
> This commit is about the flags being issued to file open.
> Where I'm sort of suprised since this is Linux <> Linux.
> So perhaps it is all about big-endian <> Little Endian.
> 
> My PR is more about the error code that differ between Linux and
> FreeBSD. So A FreeBSD client would not (correctly) understand the error
> codes that a Linux server issues.
> So I translate all wire error codes to Linux codes on a server
> (hostos_to_ceph_errno_conv()), and in a FreeBSD client I translate the
> wire-error codes into FreeBSD codes. (ceph_to_hostoserrno_conv())
> 
> Specific in this case:
> ECANCELED is 125 on Linux, and is the on wire code.
> So the servers started in the test will signal ECANCELED with value 125,
> but because the rados-stripe code does not translate that back into 85
> (FreeBSD ECANCELED) it is reported as a unkonwn error. Whereas in this
> part of the code ECANCELED is a valid return and is adequately handled.
> 
> So that is why I suspect that the rados code does not take this
> translation into account. Which is not supprising, since only the code
> from OS to wire was available, but the back path was not included until
> I introduced it. But it is hard to find all locations where it should be
> applied.
> 
> So I'm looking for (all) the correct place to insert:
> 	ceph_to_hostos_errno_conv(err)

I think I might have found some locations where the result on the wire
is fetched.... But not sure is this is at the correct level?

os/fs/aio.h:42:  int get_return_value() {
libradosstriper/MultiAioCompletionImpl.h:116:  int get_return_value() {
librados/AioCompletionImpl.h:115:  int get_return_value() {
librados/PoolAsyncCompletionImpl.h:59:    int get_return_value() {
librbd/io/AioCompletion.cc:199:ssize_t AioCompletion::get_return_value() {

--WjW



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-18 17:18     ` Willem Jan Withagen
@ 2017-06-19 12:56       ` Jason Dillaman
  2017-06-19 13:00         ` Willem Jan Withagen
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Dillaman @ 2017-06-19 12:56 UTC (permalink / raw)
  To: Willem Jan Withagen; +Cc: John Spray, Ceph Development, Sage Weil

On Sun, Jun 18, 2017 at 1:18 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> librbd/io/AioCompletion.cc:199:ssize_t AioCompletion::get_return_value() {


librbd just wraps librados, so I would think all the error codes
should have already been properly translated before it reaches this
level since otherwise any internal librbd error logging will output
the incorrect failure reason. I'd suspect most of the client-side
handling should probably be handled inside osdc/Objecter.h/cc..

-- 
Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-19 12:56       ` Jason Dillaman
@ 2017-06-19 13:00         ` Willem Jan Withagen
  2017-06-19 14:31           ` Sage Weil
  0 siblings, 1 reply; 17+ messages in thread
From: Willem Jan Withagen @ 2017-06-19 13:00 UTC (permalink / raw)
  To: dillaman; +Cc: John Spray, Ceph Development, Sage Weil

On 19-6-2017 14:56, Jason Dillaman wrote:
> On Sun, Jun 18, 2017 at 1:18 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> librbd/io/AioCompletion.cc:199:ssize_t AioCompletion::get_return_value() {
> 
> 
> librbd just wraps librados, so I would think all the error codes
> should have already been properly translated before it reaches this
> level since otherwise any internal librbd error logging will output
> the incorrect failure reason. I'd suspect most of the client-side
> handling should probably be handled inside osdc/Objecter.h/cc..

Hi Jason,

Thanx for the pointer. Changing any of the librbd stuff did indeed not
result in a working rados-stripper.sh

Objecter.{h,cc} already had the forward error rewrite. I added the
reverse in the original patch. But obviously that is not enough (yet)
So I'll start digging a bit more in the librados files as you suggested.

--WjW



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-19 13:00         ` Willem Jan Withagen
@ 2017-06-19 14:31           ` Sage Weil
  2017-06-19 14:46             ` Willem Jan Withagen
  0 siblings, 1 reply; 17+ messages in thread
From: Sage Weil @ 2017-06-19 14:31 UTC (permalink / raw)
  To: Willem Jan Withagen; +Cc: dillaman, John Spray, Ceph Development

On Mon, 19 Jun 2017, Willem Jan Withagen wrote:
> On 19-6-2017 14:56, Jason Dillaman wrote:
> > On Sun, Jun 18, 2017 at 1:18 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> >> librbd/io/AioCompletion.cc:199:ssize_t AioCompletion::get_return_value() {
> > 
> > 
> > librbd just wraps librados, so I would think all the error codes
> > should have already been properly translated before it reaches this
> > level since otherwise any internal librbd error logging will output
> > the incorrect failure reason. I'd suspect most of the client-side
> > handling should probably be handled inside osdc/Objecter.h/cc..
> 
> Hi Jason,
> 
> Thanx for the pointer. Changing any of the librbd stuff did indeed not
> result in a working rados-stripper.sh
> 
> Objecter.{h,cc} already had the forward error rewrite. I added the
> reverse in the original patch. But obviously that is not enough (yet)
> So I'll start digging a bit more in the librados files as you suggested.

I think the place to do this is in MOSDOpReply.. that alone should be 
enough to do the translate as the value passes over the wire.

s

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-19 14:31           ` Sage Weil
@ 2017-06-19 14:46             ` Willem Jan Withagen
  2017-06-19 14:55               ` Gregory Farnum
  0 siblings, 1 reply; 17+ messages in thread
From: Willem Jan Withagen @ 2017-06-19 14:46 UTC (permalink / raw)
  To: Sage Weil; +Cc: dillaman, John Spray, Ceph Development

Op 19-6-2017 om 16:31 schreef Sage Weil:
> On Mon, 19 Jun 2017, Willem Jan Withagen wrote:
>> On 19-6-2017 14:56, Jason Dillaman wrote:
>>> On Sun, Jun 18, 2017 at 1:18 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>>> librbd/io/AioCompletion.cc:199:ssize_t AioCompletion::get_return_value() {
>>>
>>> librbd just wraps librados, so I would think all the error codes
>>> should have already been properly translated before it reaches this
>>> level since otherwise any internal librbd error logging will output
>>> the incorrect failure reason. I'd suspect most of the client-side
>>> handling should probably be handled inside osdc/Objecter.h/cc..
>> Hi Jason,
>>
>> Thanx for the pointer. Changing any of the librbd stuff did indeed not
>> result in a working rados-stripper.sh
>>
>> Objecter.{h,cc} already had the forward error rewrite. I added the
>> reverse in the original patch. But obviously that is not enough (yet)
>> So I'll start digging a bit more in the librados files as you suggested.
> I think the place to do this is in MOSDOpReply.. that alone should be
> enough to do the translate as the value passes over the wire.

Hi Sage,

Tehe interesting part of this is that ALL tests but one actually work. 
So all tests that start
a cluster thru vstart actually do work. EXCEPT for rados-stiper.sh.

Now this make me question what is different with the stiper code that 
causes an ECANCEL
to not be translated back ot FreeBSD code.

--WjW

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-19 14:46             ` Willem Jan Withagen
@ 2017-06-19 14:55               ` Gregory Farnum
  2017-06-19 15:45                 ` Willem Jan Withagen
  0 siblings, 1 reply; 17+ messages in thread
From: Gregory Farnum @ 2017-06-19 14:55 UTC (permalink / raw)
  To: Willem Jan Withagen
  Cc: Sage Weil, Jason Dillaman, John Spray, Ceph Development

On Mon, Jun 19, 2017 at 7:46 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> Op 19-6-2017 om 16:31 schreef Sage Weil:
>>
>> On Mon, 19 Jun 2017, Willem Jan Withagen wrote:
>>>
>>> On 19-6-2017 14:56, Jason Dillaman wrote:
>>>>
>>>> On Sun, Jun 18, 2017 at 1:18 PM, Willem Jan Withagen <wjw@digiware.nl>
>>>> wrote:
>>>>>
>>>>> librbd/io/AioCompletion.cc:199:ssize_t
>>>>> AioCompletion::get_return_value() {
>>>>
>>>>
>>>> librbd just wraps librados, so I would think all the error codes
>>>> should have already been properly translated before it reaches this
>>>> level since otherwise any internal librbd error logging will output
>>>> the incorrect failure reason. I'd suspect most of the client-side
>>>> handling should probably be handled inside osdc/Objecter.h/cc..
>>>
>>> Hi Jason,
>>>
>>> Thanx for the pointer. Changing any of the librbd stuff did indeed not
>>> result in a working rados-stripper.sh
>>>
>>> Objecter.{h,cc} already had the forward error rewrite. I added the
>>> reverse in the original patch. But obviously that is not enough (yet)
>>> So I'll start digging a bit more in the librados files as you suggested.
>>
>> I think the place to do this is in MOSDOpReply.. that alone should be
>> enough to do the translate as the value passes over the wire.
>
>
> Hi Sage,
>
> Tehe interesting part of this is that ALL tests but one actually work. So
> all tests that start
> a cluster thru vstart actually do work. EXCEPT for rados-stiper.sh.
>
> Now this make me question what is different with the stiper code that causes
> an ECANCEL
> to not be translated back ot FreeBSD code.

I'm not sure exactly how it's arranged, but libradosstriper is layered
on top of librados and I don't think anybody's done any of the errno
translation work for other platforms that you got pointed at.
Depending on how it's done that may mean it's missing big chunks --
for instance, if libradosstriper embeds error codes that aren't
touched by librados, it will need to do its own translation.
-Greg

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-19 14:55               ` Gregory Farnum
@ 2017-06-19 15:45                 ` Willem Jan Withagen
  2017-06-19 23:45                   ` Willem Jan Withagen
  0 siblings, 1 reply; 17+ messages in thread
From: Willem Jan Withagen @ 2017-06-19 15:45 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, Jason Dillaman, John Spray, Ceph Development

On 19-6-2017 16:55, Gregory Farnum wrote:
> On Mon, Jun 19, 2017 at 7:46 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> Op 19-6-2017 om 16:31 schreef Sage Weil:
>>>
>>> On Mon, 19 Jun 2017, Willem Jan Withagen wrote:
>>>>
>>>> On 19-6-2017 14:56, Jason Dillaman wrote:
>>>>>
>>>>> On Sun, Jun 18, 2017 at 1:18 PM, Willem Jan Withagen <wjw@digiware.nl>
>>>>> wrote:
>>>>>>
>>>>>> librbd/io/AioCompletion.cc:199:ssize_t
>>>>>> AioCompletion::get_return_value() {
>>>>>
>>>>>
>>>>> librbd just wraps librados, so I would think all the error codes
>>>>> should have already been properly translated before it reaches this
>>>>> level since otherwise any internal librbd error logging will output
>>>>> the incorrect failure reason. I'd suspect most of the client-side
>>>>> handling should probably be handled inside osdc/Objecter.h/cc..
>>>>
>>>> Hi Jason,
>>>>
>>>> Thanx for the pointer. Changing any of the librbd stuff did indeed not
>>>> result in a working rados-stripper.sh
>>>>
>>>> Objecter.{h,cc} already had the forward error rewrite. I added the
>>>> reverse in the original patch. But obviously that is not enough (yet)
>>>> So I'll start digging a bit more in the librados files as you suggested.
>>>
>>> I think the place to do this is in MOSDOpReply.. that alone should be
>>> enough to do the translate as the value passes over the wire.
>>
>>
>> Hi Sage,
>>
>> Tehe interesting part of this is that ALL tests but one actually work. So
>> all tests that start
>> a cluster thru vstart actually do work. EXCEPT for rados-stiper.sh.
>>
>> Now this make me question what is different with the stiper code that causes
>> an ECANCEL
>> to not be translated back ot FreeBSD code.
> 
> I'm not sure exactly how it's arranged, but libradosstriper is layered
> on top of librados and I don't think anybody's done any of the errno
> translation work for other platforms that you got pointed at.
> Depending on how it's done that may mean it's missing big chunks --
> for instance, if libradosstriper embeds error codes that aren't
> touched by librados, it will need to do its own translation.

Hi Greg,

The error is on the path server -> client.

How do I know: FreeBSD highest error number atm is 96.
ECANCELD is an expected return value in the stiper-code.
So server-side  translation seems to be doing what it should.
Client-side code is:

1260 ./src/libradosstriper/RadosStriperImpl.cc
====
  bl.append(oss.str());
  writeOp.setxattr(XATTR_SIZE, bl);
  rc = m_ioCtx.operate(firstObjOid, &writeOp);
  // return current size
  *size = curSize;
  // handle case where objectsize is already bigger than size
  if (-ECANCELED == rc)
    rc = 0;
  if (rc) {
    unlockObject(soid, *lockCookie);
    lderr(cct()) << "RadosStriperImpl::openStripedObjectForWrite : "
                   << "could not set new size for "
                   << soid << " : rc = " << rc << dendl;
  }
  return rc;
====

So I have ot drill down into m_ioCtx.operate.
But I'll first look at Sage's suggestion.

--WjW




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-17 17:52 ` John Spray
  2017-06-17 20:59   ` Willem Jan Withagen
@ 2017-06-19 17:29   ` Adam C. Emerson
  2017-06-19 23:17     ` Willem Jan Withagen
  1 sibling, 1 reply; 17+ messages in thread
From: Adam C. Emerson @ 2017-06-19 17:29 UTC (permalink / raw)
  To: The Sacred Order of the Squid Cybernetic

On 17/06/2017, John Spray wrote:
> Somewhat related perhaps: people running cephfs on ARM recently had
> this problem, for that case the solution was simply to define in Ceph
> some constants that mirror the linux ones, see commit 88d2da5e9.

We may wish to consider using std::error_code for Ceph's internal
error stuff rather than trying to shoehorn everything into the POSIX +
Whatever Linux Defines error codes. That way we could have a lot more
specificity and preserve the source of the error while still defining
an equivalence (where appropriate) to POSIX-derived error conditions.

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-19 17:29   ` Adam C. Emerson
@ 2017-06-19 23:17     ` Willem Jan Withagen
  0 siblings, 0 replies; 17+ messages in thread
From: Willem Jan Withagen @ 2017-06-19 23:17 UTC (permalink / raw)
  To: The Sacred Order of the Squid Cybernetic

On 19-6-2017 19:29, Adam C. Emerson wrote:
> On 17/06/2017, John Spray wrote:
>> Somewhat related perhaps: people running cephfs on ARM recently had
>> this problem, for that case the solution was simply to define in Ceph
>> some constants that mirror the linux ones, see commit 88d2da5e9.
> 
> We may wish to consider using std::error_code for Ceph's internal
> error stuff rather than trying to shoehorn everything into the POSIX +
> Whatever Linux Defines error codes. That way we could have a lot more
> specificity and preserve the source of the error while still defining
> an equivalence (where appropriate) to POSIX-derived error conditions.
> 

Jesse also suggested using `std:error_code`, and although that would be
a "wire" neutral system. This would still require to identify the
locations where conversion needs to be done.

And as such I'm going to first focus on fixing my current problem. Once
that is done it is clear where to do it. Only then rework the how.
So this is going on the long list with nice to have.

--WjW


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-19 15:45                 ` Willem Jan Withagen
@ 2017-06-19 23:45                   ` Willem Jan Withagen
  2017-06-20  2:35                     ` Sage Weil
  0 siblings, 1 reply; 17+ messages in thread
From: Willem Jan Withagen @ 2017-06-19 23:45 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, Jason Dillaman, John Spray, Ceph Development

On 19-6-2017 17:45, Willem Jan Withagen wrote:
> On 19-6-2017 16:55, Gregory Farnum wrote:
>> On Mon, Jun 19, 2017 at 7:46 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>> Op 19-6-2017 om 16:31 schreef Sage Weil:
>>>>
>>>> On Mon, 19 Jun 2017, Willem Jan Withagen wrote:
>>>>>
>>>>> On 19-6-2017 14:56, Jason Dillaman wrote:
>>>>>>
>>>>>> On Sun, Jun 18, 2017 at 1:18 PM, Willem Jan Withagen <wjw@digiware.nl>
>>>>>> wrote:
>>>>>>>
>>>>>>> librbd/io/AioCompletion.cc:199:ssize_t
>>>>>>> AioCompletion::get_return_value() {
>>>>>>
>>>>>>
>>>>>> librbd just wraps librados, so I would think all the error codes
>>>>>> should have already been properly translated before it reaches this
>>>>>> level since otherwise any internal librbd error logging will output
>>>>>> the incorrect failure reason. I'd suspect most of the client-side
>>>>>> handling should probably be handled inside osdc/Objecter.h/cc..
>>>>>
>>>>> Hi Jason,
>>>>>
>>>>> Thanx for the pointer. Changing any of the librbd stuff did indeed not
>>>>> result in a working rados-stripper.sh
>>>>>
>>>>> Objecter.{h,cc} already had the forward error rewrite. I added the
>>>>> reverse in the original patch. But obviously that is not enough (yet)
>>>>> So I'll start digging a bit more in the librados files as you suggested.
>>>>
>>>> I think the place to do this is in MOSDOpReply.. that alone should be
>>>> enough to do the translate as the value passes over the wire.
>>>
>>>
>>> Hi Sage,
>>>
>>> Tehe interesting part of this is that ALL tests but one actually work. So
>>> all tests that start
>>> a cluster thru vstart actually do work. EXCEPT for rados-stiper.sh.
>>>
>>> Now this make me question what is different with the stiper code that causes
>>> an ECANCEL
>>> to not be translated back ot FreeBSD code.
>>
>> I'm not sure exactly how it's arranged, but libradosstriper is layered
>> on top of librados and I don't think anybody's done any of the errno
>> translation work for other platforms that you got pointed at.
>> Depending on how it's done that may mean it's missing big chunks --
>> for instance, if libradosstriper embeds error codes that aren't
>> touched by librados, it will need to do its own translation.
> 
> Hi Greg,
> 
> The error is on the path server -> client.
> 
> How do I know: FreeBSD highest error number atm is 96.
> ECANCELD is an expected return value in the stiper-code.
> So server-side  translation seems to be doing what it should.
> Client-side code is:
> 
> 1260 ./src/libradosstriper/RadosStriperImpl.cc
> ====
>   bl.append(oss.str());
>   writeOp.setxattr(XATTR_SIZE, bl);
>   rc = m_ioCtx.operate(firstObjOid, &writeOp);
>   // return current size
>   *size = curSize;
>   // handle case where objectsize is already bigger than size
>   if (-ECANCELED == rc)
>     rc = 0;
>   if (rc) {
>     unlockObject(soid, *lockCookie);
>     lderr(cct()) << "RadosStriperImpl::openStripedObjectForWrite : "
>                    << "could not set new size for "
>                    << soid << " : rc = " << rc << dendl;
>   }
>   return rc;
> ====
> 
> So I have ot drill down into m_ioCtx.operate.
> But I'll first look at Sage's suggestion.

Have not been able to find the right spot....
So uped the logging, and this is the first place where any reference to
-125 is made:
116: 2017-06-20 01:24:21.556950 80fc18800  5 -- 127.0.0.1:0/1969737172
>> 127.0.0.1:6804/60048 conn(0x81065c000 :-1 s=STATE_OPEN
_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=1). rx osd.1 seq 6
0x810696e00 osd_op_reply(5 toyfile.0000000000000000 [cmpxattr
(8) op 3 mode 2,setxattr (4)] v19'4 uv3 ondisk = -125 ((125) Unknown
error: 125)) v8
116: 2017-06-20 01:24:21.556985 80fc18800  1 -- 127.0.0.1:0/1969737172
<== osd.1 127.0.0.1:6804/60048 6 ==== osd_op_reply(5 toyf
ile.0000000000000000 [cmpxattr (8) op 3 mode 2,setxattr (4)] v19'4 uv3
ondisk = -125 ((125) Unknown error: 125)) v8 ==== 210+0+0
 (669224781 0 0) 0x810696e00 con 0x81065c000
116: 2017-06-20 01:24:21.557009 80fc18800 10 client.4115.objecter
ms_dispatch 0x80fc33000 osd_op_reply(5 toyfile.000000000000000
0 [cmpxattr (8) op 3 mode 2,setxattr (4)] v19'4 uv3 ondisk = -125 ((125)
Unknown error: 125)) v8
116: 2017-06-20 01:24:21.557024 80fc18800 10 client.4115.objecter in
handle_osd_op_reply
116: 2017-06-20 01:24:21.557031 80fc18800  7 client.4115.objecter
handle_osd_op_reply 5 ondisk uv 3 in 1.3 attempt 0
116: 2017-06-20 01:24:21.557038 80fc18800 10 client.4115.objecter  op 0
rval -85 len 0
116: 2017-06-20 01:24:21.557043 80fc18800 10 client.4115.objecter  op 1
rval 0 len 0
116: 2017-06-20 01:24:21.557047 80fc18800 15 client.4115.objecter
handle_osd_op_reply completed tid 5
116: 2017-06-20 01:24:21.557050 80fc18800 15 client.4115.objecter
finish_op 5
116: 2017-06-20 01:24:21.557056 80fc18800 20 client.4115.objecter
put_session s=0x810695800 osd=1 4
116: 2017-06-20 01:24:21.557060 80fc18800 15 client.4115.objecter
_session_op_remove 1 5
116: 2017-06-20 01:24:21.557073 80fc18800  5 client.4115.objecter 0 in
flight
116: 2017-06-20 01:24:21.557085 80fc18800 20 client.4115.objecter
put_session s=0x810695800 osd=1 3

This make me wonder and now the question is if this osd_reply contains
the numeric error value or is it a formatted text error report of some
event on the server and there is already a translation problem on the
server, and not in the client.

--WjW

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-19 23:45                   ` Willem Jan Withagen
@ 2017-06-20  2:35                     ` Sage Weil
  2017-06-20  8:23                       ` Willem Jan Withagen
  2017-06-20 13:55                       ` Willem Jan Withagen
  0 siblings, 2 replies; 17+ messages in thread
From: Sage Weil @ 2017-06-20  2:35 UTC (permalink / raw)
  To: Willem Jan Withagen
  Cc: Gregory Farnum, Jason Dillaman, John Spray, Ceph Development

Try changing

  int32_t rval;

in OSDOp in osd_types.h to errorcode32_t.

sage


On Tue, 20 Jun 2017, Willem Jan Withagen wrote:

> On 19-6-2017 17:45, Willem Jan Withagen wrote:
> > On 19-6-2017 16:55, Gregory Farnum wrote:
> >> On Mon, Jun 19, 2017 at 7:46 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> >>> Op 19-6-2017 om 16:31 schreef Sage Weil:
> >>>>
> >>>> On Mon, 19 Jun 2017, Willem Jan Withagen wrote:
> >>>>>
> >>>>> On 19-6-2017 14:56, Jason Dillaman wrote:
> >>>>>>
> >>>>>> On Sun, Jun 18, 2017 at 1:18 PM, Willem Jan Withagen <wjw@digiware.nl>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> librbd/io/AioCompletion.cc:199:ssize_t
> >>>>>>> AioCompletion::get_return_value() {
> >>>>>>
> >>>>>>
> >>>>>> librbd just wraps librados, so I would think all the error codes
> >>>>>> should have already been properly translated before it reaches this
> >>>>>> level since otherwise any internal librbd error logging will output
> >>>>>> the incorrect failure reason. I'd suspect most of the client-side
> >>>>>> handling should probably be handled inside osdc/Objecter.h/cc..
> >>>>>
> >>>>> Hi Jason,
> >>>>>
> >>>>> Thanx for the pointer. Changing any of the librbd stuff did indeed not
> >>>>> result in a working rados-stripper.sh
> >>>>>
> >>>>> Objecter.{h,cc} already had the forward error rewrite. I added the
> >>>>> reverse in the original patch. But obviously that is not enough (yet)
> >>>>> So I'll start digging a bit more in the librados files as you suggested.
> >>>>
> >>>> I think the place to do this is in MOSDOpReply.. that alone should be
> >>>> enough to do the translate as the value passes over the wire.
> >>>
> >>>
> >>> Hi Sage,
> >>>
> >>> Tehe interesting part of this is that ALL tests but one actually work. So
> >>> all tests that start
> >>> a cluster thru vstart actually do work. EXCEPT for rados-stiper.sh.
> >>>
> >>> Now this make me question what is different with the stiper code that causes
> >>> an ECANCEL
> >>> to not be translated back ot FreeBSD code.
> >>
> >> I'm not sure exactly how it's arranged, but libradosstriper is layered
> >> on top of librados and I don't think anybody's done any of the errno
> >> translation work for other platforms that you got pointed at.
> >> Depending on how it's done that may mean it's missing big chunks --
> >> for instance, if libradosstriper embeds error codes that aren't
> >> touched by librados, it will need to do its own translation.
> > 
> > Hi Greg,
> > 
> > The error is on the path server -> client.
> > 
> > How do I know: FreeBSD highest error number atm is 96.
> > ECANCELD is an expected return value in the stiper-code.
> > So server-side  translation seems to be doing what it should.
> > Client-side code is:
> > 
> > 1260 ./src/libradosstriper/RadosStriperImpl.cc
> > ====
> >   bl.append(oss.str());
> >   writeOp.setxattr(XATTR_SIZE, bl);
> >   rc = m_ioCtx.operate(firstObjOid, &writeOp);
> >   // return current size
> >   *size = curSize;
> >   // handle case where objectsize is already bigger than size
> >   if (-ECANCELED == rc)
> >     rc = 0;
> >   if (rc) {
> >     unlockObject(soid, *lockCookie);
> >     lderr(cct()) << "RadosStriperImpl::openStripedObjectForWrite : "
> >                    << "could not set new size for "
> >                    << soid << " : rc = " << rc << dendl;
> >   }
> >   return rc;
> > ====
> > 
> > So I have ot drill down into m_ioCtx.operate.
> > But I'll first look at Sage's suggestion.
> 
> Have not been able to find the right spot....
> So uped the logging, and this is the first place where any reference to
> -125 is made:
> 116: 2017-06-20 01:24:21.556950 80fc18800  5 -- 127.0.0.1:0/1969737172
> >> 127.0.0.1:6804/60048 conn(0x81065c000 :-1 s=STATE_OPEN
> _MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=1). rx osd.1 seq 6
> 0x810696e00 osd_op_reply(5 toyfile.0000000000000000 [cmpxattr
> (8) op 3 mode 2,setxattr (4)] v19'4 uv3 ondisk = -125 ((125) Unknown
> error: 125)) v8
> 116: 2017-06-20 01:24:21.556985 80fc18800  1 -- 127.0.0.1:0/1969737172
> <== osd.1 127.0.0.1:6804/60048 6 ==== osd_op_reply(5 toyf
> ile.0000000000000000 [cmpxattr (8) op 3 mode 2,setxattr (4)] v19'4 uv3
> ondisk = -125 ((125) Unknown error: 125)) v8 ==== 210+0+0
>  (669224781 0 0) 0x810696e00 con 0x81065c000
> 116: 2017-06-20 01:24:21.557009 80fc18800 10 client.4115.objecter
> ms_dispatch 0x80fc33000 osd_op_reply(5 toyfile.000000000000000
> 0 [cmpxattr (8) op 3 mode 2,setxattr (4)] v19'4 uv3 ondisk = -125 ((125)
> Unknown error: 125)) v8
> 116: 2017-06-20 01:24:21.557024 80fc18800 10 client.4115.objecter in
> handle_osd_op_reply
> 116: 2017-06-20 01:24:21.557031 80fc18800  7 client.4115.objecter
> handle_osd_op_reply 5 ondisk uv 3 in 1.3 attempt 0
> 116: 2017-06-20 01:24:21.557038 80fc18800 10 client.4115.objecter  op 0
> rval -85 len 0
> 116: 2017-06-20 01:24:21.557043 80fc18800 10 client.4115.objecter  op 1
> rval 0 len 0
> 116: 2017-06-20 01:24:21.557047 80fc18800 15 client.4115.objecter
> handle_osd_op_reply completed tid 5
> 116: 2017-06-20 01:24:21.557050 80fc18800 15 client.4115.objecter
> finish_op 5
> 116: 2017-06-20 01:24:21.557056 80fc18800 20 client.4115.objecter
> put_session s=0x810695800 osd=1 4
> 116: 2017-06-20 01:24:21.557060 80fc18800 15 client.4115.objecter
> _session_op_remove 1 5
> 116: 2017-06-20 01:24:21.557073 80fc18800  5 client.4115.objecter 0 in
> flight
> 116: 2017-06-20 01:24:21.557085 80fc18800 20 client.4115.objecter
> put_session s=0x810695800 osd=1 3
> 
> This make me wonder and now the question is if this osd_reply contains
> the numeric error value or is it a formatted text error report of some
> event on the server and there is already a translation problem on the
> server, and not in the client.
> 
> --WjW
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-20  2:35                     ` Sage Weil
@ 2017-06-20  8:23                       ` Willem Jan Withagen
  2017-06-20  9:27                         ` Willem Jan Withagen
  2017-06-20 13:55                       ` Willem Jan Withagen
  1 sibling, 1 reply; 17+ messages in thread
From: Willem Jan Withagen @ 2017-06-20  8:23 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, Jason Dillaman, John Spray, Ceph Development

On 20-6-2017 04:35, Sage Weil wrote:
> Try changing
> 
>   int32_t rval;
> 
> in OSDOp in osd_types.h to errorcode32_t.

Nice suggestion, and I think it is a correct one.
But I'm still getting -125 as error code.

--WjW

> 
> sage
> 
> 
> On Tue, 20 Jun 2017, Willem Jan Withagen wrote:
> 
>> On 19-6-2017 17:45, Willem Jan Withagen wrote:
>>> On 19-6-2017 16:55, Gregory Farnum wrote:
>>>> On Mon, Jun 19, 2017 at 7:46 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>>>> Op 19-6-2017 om 16:31 schreef Sage Weil:
>>>>>>
>>>>>> On Mon, 19 Jun 2017, Willem Jan Withagen wrote:
>>>>>>>
>>>>>>> On 19-6-2017 14:56, Jason Dillaman wrote:
>>>>>>>>
>>>>>>>> On Sun, Jun 18, 2017 at 1:18 PM, Willem Jan Withagen <wjw@digiware.nl>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> librbd/io/AioCompletion.cc:199:ssize_t
>>>>>>>>> AioCompletion::get_return_value() {
>>>>>>>>
>>>>>>>>
>>>>>>>> librbd just wraps librados, so I would think all the error codes
>>>>>>>> should have already been properly translated before it reaches this
>>>>>>>> level since otherwise any internal librbd error logging will output
>>>>>>>> the incorrect failure reason. I'd suspect most of the client-side
>>>>>>>> handling should probably be handled inside osdc/Objecter.h/cc..
>>>>>>>
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> Thanx for the pointer. Changing any of the librbd stuff did indeed not
>>>>>>> result in a working rados-stripper.sh
>>>>>>>
>>>>>>> Objecter.{h,cc} already had the forward error rewrite. I added the
>>>>>>> reverse in the original patch. But obviously that is not enough (yet)
>>>>>>> So I'll start digging a bit more in the librados files as you suggested.
>>>>>>
>>>>>> I think the place to do this is in MOSDOpReply.. that alone should be
>>>>>> enough to do the translate as the value passes over the wire.
>>>>>
>>>>>
>>>>> Hi Sage,
>>>>>
>>>>> Tehe interesting part of this is that ALL tests but one actually work. So
>>>>> all tests that start
>>>>> a cluster thru vstart actually do work. EXCEPT for rados-stiper.sh.
>>>>>
>>>>> Now this make me question what is different with the stiper code that causes
>>>>> an ECANCEL
>>>>> to not be translated back ot FreeBSD code.
>>>>
>>>> I'm not sure exactly how it's arranged, but libradosstriper is layered
>>>> on top of librados and I don't think anybody's done any of the errno
>>>> translation work for other platforms that you got pointed at.
>>>> Depending on how it's done that may mean it's missing big chunks --
>>>> for instance, if libradosstriper embeds error codes that aren't
>>>> touched by librados, it will need to do its own translation.
>>>
>>> Hi Greg,
>>>
>>> The error is on the path server -> client.
>>>
>>> How do I know: FreeBSD highest error number atm is 96.
>>> ECANCELD is an expected return value in the stiper-code.
>>> So server-side  translation seems to be doing what it should.
>>> Client-side code is:
>>>
>>> 1260 ./src/libradosstriper/RadosStriperImpl.cc
>>> ====
>>>   bl.append(oss.str());
>>>   writeOp.setxattr(XATTR_SIZE, bl);
>>>   rc = m_ioCtx.operate(firstObjOid, &writeOp);
>>>   // return current size
>>>   *size = curSize;
>>>   // handle case where objectsize is already bigger than size
>>>   if (-ECANCELED == rc)
>>>     rc = 0;
>>>   if (rc) {
>>>     unlockObject(soid, *lockCookie);
>>>     lderr(cct()) << "RadosStriperImpl::openStripedObjectForWrite : "
>>>                    << "could not set new size for "
>>>                    << soid << " : rc = " << rc << dendl;
>>>   }
>>>   return rc;
>>> ====
>>>
>>> So I have ot drill down into m_ioCtx.operate.
>>> But I'll first look at Sage's suggestion.
>>
>> Have not been able to find the right spot....
>> So uped the logging, and this is the first place where any reference to
>> -125 is made:
>> 116: 2017-06-20 01:24:21.556950 80fc18800  5 -- 127.0.0.1:0/1969737172
>>>> 127.0.0.1:6804/60048 conn(0x81065c000 :-1 s=STATE_OPEN
>> _MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=1). rx osd.1 seq 6
>> 0x810696e00 osd_op_reply(5 toyfile.0000000000000000 [cmpxattr
>> (8) op 3 mode 2,setxattr (4)] v19'4 uv3 ondisk = -125 ((125) Unknown
>> error: 125)) v8
>> 116: 2017-06-20 01:24:21.556985 80fc18800  1 -- 127.0.0.1:0/1969737172
>> <== osd.1 127.0.0.1:6804/60048 6 ==== osd_op_reply(5 toyf
>> ile.0000000000000000 [cmpxattr (8) op 3 mode 2,setxattr (4)] v19'4 uv3
>> ondisk = -125 ((125) Unknown error: 125)) v8 ==== 210+0+0
>>  (669224781 0 0) 0x810696e00 con 0x81065c000
>> 116: 2017-06-20 01:24:21.557009 80fc18800 10 client.4115.objecter
>> ms_dispatch 0x80fc33000 osd_op_reply(5 toyfile.000000000000000
>> 0 [cmpxattr (8) op 3 mode 2,setxattr (4)] v19'4 uv3 ondisk = -125 ((125)
>> Unknown error: 125)) v8
>> 116: 2017-06-20 01:24:21.557024 80fc18800 10 client.4115.objecter in
>> handle_osd_op_reply
>> 116: 2017-06-20 01:24:21.557031 80fc18800  7 client.4115.objecter
>> handle_osd_op_reply 5 ondisk uv 3 in 1.3 attempt 0
>> 116: 2017-06-20 01:24:21.557038 80fc18800 10 client.4115.objecter  op 0
>> rval -85 len 0
>> 116: 2017-06-20 01:24:21.557043 80fc18800 10 client.4115.objecter  op 1
>> rval 0 len 0
>> 116: 2017-06-20 01:24:21.557047 80fc18800 15 client.4115.objecter
>> handle_osd_op_reply completed tid 5
>> 116: 2017-06-20 01:24:21.557050 80fc18800 15 client.4115.objecter
>> finish_op 5
>> 116: 2017-06-20 01:24:21.557056 80fc18800 20 client.4115.objecter
>> put_session s=0x810695800 osd=1 4
>> 116: 2017-06-20 01:24:21.557060 80fc18800 15 client.4115.objecter
>> _session_op_remove 1 5
>> 116: 2017-06-20 01:24:21.557073 80fc18800  5 client.4115.objecter 0 in
>> flight
>> 116: 2017-06-20 01:24:21.557085 80fc18800 20 client.4115.objecter
>> put_session s=0x810695800 osd=1 3
>>
>> This make me wonder and now the question is if this osd_reply contains
>> the numeric error value or is it a formatted text error report of some
>> event on the server and there is already a translation problem on the
>> server, and not in the client.
>>
>> --WjW
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-20  8:23                       ` Willem Jan Withagen
@ 2017-06-20  9:27                         ` Willem Jan Withagen
  0 siblings, 0 replies; 17+ messages in thread
From: Willem Jan Withagen @ 2017-06-20  9:27 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, Jason Dillaman, John Spray, Ceph Development

On 20-6-2017 10:23, Willem Jan Withagen wrote:
> On 20-6-2017 04:35, Sage Weil wrote:
>> Try changing
>>
>>   int32_t rval;
>>
>> in OSDOp in osd_types.h to errorcode32_t.
> 
> Nice suggestion, and I think it is a correct one.
> But I'm still getting -125 as error code.

I think this is the suspect part?
  for (unsigned i = 0;
       p != out_ops.end() && pb != op->out_bl.end();
       ++i, ++p, ++pb, ++pr, ++ph) {
    ldout(cct, 10) << " op " << i << " rval " << p->rval
                   << " len " << p->outdata.length() << dendl;
    if (*pb)
      **pb = p->outdata;
    // set rval before running handlers so that handlers
    // can change it if e.g. decoding fails
    if (*pr) {
      **pr = ceph_to_hostos_errno(p->rval);
      ldout(cct, 10) << "after  ceph_to_hostos_errno **pr: " << **pr <<
dendl;
    }
    if (*ph) {
      ldout(cct, 10) << " op " << i << " handler " << *ph << dendl;
      (*ph)->complete(ceph_to_hostos_errno(p->rval));
      *ph = NULL;
    }
  }

where it generates:
116: 2017-06-20 11:20:15.365583 80fc18800 10
client.4115.objecter:handle_osd_op_reply(3461) op 0 rval -125 len 0
116: 2017-06-20 11:20:15.365591 80fc18800 10
client.4115.objecter:handle_osd_op_reply(3461) op 1 rval 0 len 0

So neither of the if(*pr) or if(*ph) cases is taken.
And thus the error is not translated.

Note that this part of the code is in the original code before I started
working on this.

--WjW


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Caught the first erroneous translated errorcode
  2017-06-20  2:35                     ` Sage Weil
  2017-06-20  8:23                       ` Willem Jan Withagen
@ 2017-06-20 13:55                       ` Willem Jan Withagen
  1 sibling, 0 replies; 17+ messages in thread
From: Willem Jan Withagen @ 2017-06-20 13:55 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, Jason Dillaman, John Spray, Ceph Development

On 20-6-2017 04:35, Sage Weil wrote:
> Try changing
> 
>   int32_t rval;
> 
> in OSDOp in osd_types.h to errorcode32_t.

That was essentially the correct pointer.
Needed to fix my own code. Wrote that late at night, obviously wasn't
too fresh at that point.
But it seems to work now.

https://github.com/ceph/ceph/pull/15780

Thanx for all the help, and TLC during finding this.

--WjW

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-06-20 13:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-17 10:50 Caught the first erroneous translated errorcode Willem Jan Withagen
2017-06-17 17:52 ` John Spray
2017-06-17 20:59   ` Willem Jan Withagen
2017-06-18 17:18     ` Willem Jan Withagen
2017-06-19 12:56       ` Jason Dillaman
2017-06-19 13:00         ` Willem Jan Withagen
2017-06-19 14:31           ` Sage Weil
2017-06-19 14:46             ` Willem Jan Withagen
2017-06-19 14:55               ` Gregory Farnum
2017-06-19 15:45                 ` Willem Jan Withagen
2017-06-19 23:45                   ` Willem Jan Withagen
2017-06-20  2:35                     ` Sage Weil
2017-06-20  8:23                       ` Willem Jan Withagen
2017-06-20  9:27                         ` Willem Jan Withagen
2017-06-20 13:55                       ` Willem Jan Withagen
2017-06-19 17:29   ` Adam C. Emerson
2017-06-19 23:17     ` Willem Jan Withagen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.