All of lore.kernel.org
 help / color / mirror / Atom feed
* The incomplete result of mapper GetSubTree/Paths
@ 2022-05-20  3:14 Lei Yu
  2022-05-20 21:13 ` Patrick Williams
  2022-05-23 16:19 ` Ed Tanous
  0 siblings, 2 replies; 7+ messages in thread
From: Lei Yu @ 2022-05-20  3:14 UTC (permalink / raw)
  To: openbmc; +Cc: Ed Tanous

This email is to describe an issue in mapper that the
GetSubTree/GetSubTreePaths could return incomplete results when it's
doing introspect.

Steps to reproduce the issue:
1. Configure phosphor-logging to get max 1000 entries. (with
-Derror_info_cap=1000 meson option)
2. Create 1000 logging entries.
3. Call GetSubTreePaths and make sure it gets the correct 1000 entries:
    # busctl call "xyz.openbmc_project.ObjectMapper"
"/xyz/openbmc_project/object_mapper"
"xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
/xyz/openbmc_project/logging/entry 0 1
xyz.openbmc_project.Logging.Entry | awk '{print $2;}'
    1000
4. Restart logging service
    # systemctl restart xyz.openbmc_project.Logging.service
5. After the service is restarted, call GetSubTreePaths for multiple
times in the short time (e.g. within 10 seconds)
    # busctl call "xyz.openbmc_project.ObjectMapper"
"/xyz/openbmc_project/object_mapper"
"xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
/xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
'{print $2;}'
    47
    # busctl call "xyz.openbmc_project.ObjectMapper"
"/xyz/openbmc_project/object_mapper"
"xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
/xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
'{print $2;}'
    375
    # busctl call "xyz.openbmc_project.ObjectMapper"
"/xyz/openbmc_project/object_mapper"
"xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
/xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
'{print $2;}'
    851
    # busctl call "xyz.openbmc_project.ObjectMapper"
"/xyz/openbmc_project/object_mapper"
"xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
/xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
'{print $2;}'
    1000

We can see that the result of GetSubTreePaths is increasing until it gets 1000.
This actually happens when mapper is doing introspect of the logging
service, and getting more and more objects.

The above "incomplete" behavior will impact the logic that depends on
the result of GetSubTreePaths.
E.g. in ipmid, the "cached SEL" feature depends on the reliable result
of GetSubTreePath, to get the number of current logging entries. If
it's not correct, ipmid will not know the "missed" entries.

The question is, should we make sure mapper returns the "stable"
result in the above case?
When it's doing introspect of a service (e.g. nameOwnerChanged), it
could throw if the service is not fully introspected, and only return
the "correct" result after the service is fully introspected.

If mapper could not guarantee the stable result, the service calling
mapper will have to add more complex logic to make sure it gets the
"full and correct" result.

-- 
BRs,
Lei YU

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: The incomplete result of mapper GetSubTree/Paths
  2022-05-20  3:14 The incomplete result of mapper GetSubTree/Paths Lei Yu
@ 2022-05-20 21:13 ` Patrick Williams
  2022-05-23  5:41   ` [External] " Lei Yu
  2022-05-23 16:26   ` Ed Tanous
  2022-05-23 16:19 ` Ed Tanous
  1 sibling, 2 replies; 7+ messages in thread
From: Patrick Williams @ 2022-05-20 21:13 UTC (permalink / raw)
  To: Lei Yu; +Cc: openbmc, Ed Tanous

[-- Attachment #1: Type: text/plain, Size: 613 bytes --]

On Fri, May 20, 2022 at 11:14:15AM +0800, Lei Yu wrote:
 
> If mapper could not guarantee the stable result, the service calling
> mapper will have to add more complex logic to make sure it gets the
> "full and correct" result.

The issue here is around causality.  There really isn't any way to
correctly kick this logic out to applications no matter how complex you
make the implementation.

The original mapper implementation was causally ordered but this was
lost in the conversion to C/C++.  We should look at getting back to
having mapper give causal order guarantees.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: The incomplete result of mapper GetSubTree/Paths
  2022-05-20 21:13 ` Patrick Williams
@ 2022-05-23  5:41   ` Lei Yu
  2022-05-23 16:26   ` Ed Tanous
  1 sibling, 0 replies; 7+ messages in thread
From: Lei Yu @ 2022-05-23  5:41 UTC (permalink / raw)
  To: Patrick Williams; +Cc: openbmc, Ed Tanous

On Sat, May 21, 2022 at 5:13 AM Patrick Williams <patrick@stwcx.xyz> wrote:
>
> On Fri, May 20, 2022 at 11:14:15AM +0800, Lei Yu wrote:
>
> > If mapper could not guarantee the stable result, the service calling
> > mapper will have to add more complex logic to make sure it gets the
> > "full and correct" result.
>
> The issue here is around causality.  There really isn't any way to
> correctly kick this logic out to applications no matter how complex you
> make the implementation.
>
> The original mapper implementation was causally ordered but this was
> lost in the conversion to C/C++.  We should look at getting back to
> having mapper give causal order guarantees.

A patch is sent to gerrit that is an attempt to make GetSubTreePaths
throw when it's doing introspect and return the "complete" result.
https://gerrit.openbmc.org/c/openbmc/phosphor-objmgr/+/53904

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: The incomplete result of mapper GetSubTree/Paths
  2022-05-20  3:14 The incomplete result of mapper GetSubTree/Paths Lei Yu
  2022-05-20 21:13 ` Patrick Williams
@ 2022-05-23 16:19 ` Ed Tanous
  2022-05-24  2:58   ` [External] " Lei Yu
  1 sibling, 1 reply; 7+ messages in thread
From: Ed Tanous @ 2022-05-23 16:19 UTC (permalink / raw)
  To: Lei Yu; +Cc: openbmc

[-- Attachment #1: Type: text/plain, Size: 4208 bytes --]

On Thu, May 19, 2022 at 8:14 PM Lei Yu <yulei.sh@bytedance.com> wrote:

> This email is to describe an issue in mapper that the
> GetSubTree/GetSubTreePaths could return incomplete results when it's
> doing introspect.
>
> Steps to reproduce the issue:
> 1. Configure phosphor-logging to get max 1000 entries. (with
> -Derror_info_cap=1000 meson option)
> 2. Create 1000 logging entries.
> 3. Call GetSubTreePaths and make sure it gets the correct 1000 entries:
>     # busctl call "xyz.openbmc_project.ObjectMapper"
> "/xyz/openbmc_project/object_mapper"
> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
> /xyz/openbmc_project/logging/entry 0 1
> xyz.openbmc_project.Logging.Entry | awk '{print $2;}'
>     1000
> 4. Restart logging service
>     # systemctl restart xyz.openbmc_project.Logging.service
> 5. After the service is restarted, call GetSubTreePaths for multiple
> times in the short time (e.g. within 10 seconds)
>     # busctl call "xyz.openbmc_project.ObjectMapper"
> "/xyz/openbmc_project/object_mapper"
> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
> '{print $2;}'
>     47
>     # busctl call "xyz.openbmc_project.ObjectMapper"
> "/xyz/openbmc_project/object_mapper"
> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
> '{print $2;}'
>     375
>     # busctl call "xyz.openbmc_project.ObjectMapper"
> "/xyz/openbmc_project/object_mapper"
> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
> '{print $2;}'
>     851
>     # busctl call "xyz.openbmc_project.ObjectMapper"
> "/xyz/openbmc_project/object_mapper"
> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
> '{print $2;}'
>     1000
>
> We can see that the result of GetSubTreePaths is increasing until it gets
> 1000.
> This actually happens when mapper is doing introspect of the logging
> service, and getting more and more objects.

The above "incomplete" behavior will impact the logic that depends on
> the result of GetSubTreePaths.
> E.g. in ipmid, the "cached SEL" feature depends on the reliable result
> of GetSubTreePath, to get the number of current logging entries.


I'm not really following why this is a problem, and sounds like a bug in
the cached SEL feature.  Dbus objects can get added or removed at any time,
including logging objects, if the cached sel feature isn't handling all the
added/removed signals as it should, that really should be fixed.  Can you
go into more detail about what this actually causes?  Is it that logging
objects are getting added out of order?


> If
> it's not correct, ipmid will not know the "missed" entries.
>

What is a "missed" entry in this context?


>
> The question is, should we make sure mapper returns the "stable"
> result in the above case?
>

I don't think so, at least, when it was built, it wasn't designed to return
atomically created results, but given that ANY object on dbus can be added
or removed at any time, it's not clear why this is a problem in the mapper
itself.


> When it's doing introspect of a service (e.g. nameOwnerChanged), it
> could throw if the service is not fully introspected, and only return
> the "correct" result after the service is fully introspected.


> If mapper could not guarantee the stable result, the service calling
> mapper will have to add more complex logic to make sure it gets the
> "full and correct" result.
>

I left comments on your code review as well, but please don't do this.
This is going to force retry code into EVERY service that relies on the
mapper, at EVERY mapper call site.  If the current behavior is truly a
problem, I would much rather the mapper support multi-versioned atomically
create copies of the dbus-map, and have the mapper expose the "reliable"
interface, rather than spreading that throughout the system.  For context,
bmcweb alone has dozens of call sites to the mapper that would need to be
fixed if it were made to be possible for it to fail in this way.


>
> --
> BRs,
> Lei YU
>

[-- Attachment #2: Type: text/html, Size: 5993 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: The incomplete result of mapper GetSubTree/Paths
  2022-05-20 21:13 ` Patrick Williams
  2022-05-23  5:41   ` [External] " Lei Yu
@ 2022-05-23 16:26   ` Ed Tanous
  2022-05-23 19:28     ` Patrick Williams
  1 sibling, 1 reply; 7+ messages in thread
From: Ed Tanous @ 2022-05-23 16:26 UTC (permalink / raw)
  To: Patrick Williams; +Cc: openbmc, Lei Yu

[-- Attachment #1: Type: text/plain, Size: 1668 bytes --]

On Fri, May 20, 2022 at 2:13 PM Patrick Williams <patrick@stwcx.xyz> wrote:

> On Fri, May 20, 2022 at 11:14:15AM +0800, Lei Yu wrote:
>
> > If mapper could not guarantee the stable result, the service calling
> > mapper will have to add more complex logic to make sure it gets the
> > "full and correct" result.
>
> The issue here is around causality.  There really isn't any way to
> correctly kick this logic out to applications no matter how complex you
> make the implementation.
>
> The original mapper implementation was causally ordered but this was
> lost in the conversion to C/C++.  We should look at getting back to
> having mapper give causal order guarantees.
>

I'm not really following what causality has to do with it in Leis case.
From my read, it looks like this is a case of "introspect takes some time
to complete, and the results for one daemon aren't expected to be added
atomically".  This exact same race is present any time an object is added
or removed from dbus, so you're right, there isn't really a way to avoid
it, aside from writing implementations that don't fail if the object count
on a connection doesn't match what the mapper gave you, which is something
we "learned" in bmcweb a long time ago.

One workaround here to cover the startup case would be to simply batch
together all objects for a given daemon in the InProgressIntrospect object
such that all of a daemons dbus paths are added to the global object
atomically, so we'd avoid the race in the startup case, but it would still
exist when new objects are created, so it doesn't completely solve the
problem, just solves it for the case above.


>
> --
> Patrick Williams
>

[-- Attachment #2: Type: text/html, Size: 2296 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: The incomplete result of mapper GetSubTree/Paths
  2022-05-23 16:26   ` Ed Tanous
@ 2022-05-23 19:28     ` Patrick Williams
  0 siblings, 0 replies; 7+ messages in thread
From: Patrick Williams @ 2022-05-23 19:28 UTC (permalink / raw)
  To: Ed Tanous; +Cc: openbmc, Lei Yu

[-- Attachment #1: Type: text/plain, Size: 3056 bytes --]

On Mon, May 23, 2022 at 09:26:43AM -0700, Ed Tanous wrote:
> On Fri, May 20, 2022 at 2:13 PM Patrick Williams <patrick@stwcx.xyz> wrote:
> 
> > On Fri, May 20, 2022 at 11:14:15AM +0800, Lei Yu wrote:

> > The issue here is around causality.  There really isn't any way to
> > correctly kick this logic out to applications no matter how complex you
> > make the implementation.
> >
> > The original mapper implementation was causally ordered but this was
> > lost in the conversion to C/C++.  We should look at getting back to
> > having mapper give causal order guarantees.
> >
> 
> I'm not really following what causality has to do with it in Leis case.
> From my read, it looks like this is a case of "introspect takes some time
> to complete, and the results for one daemon aren't expected to be added
> atomically".  

This is exactly a causality problem.

The logging daemon is creating 1000 objects _before_ advertising the
service.  It then advertises the service, which everyone hears, and then
mapper starts introspection.  If I ask mapper how many objects logging
has it should tell me 0 or 1000.  There was never a state when logger
hosted 438 objects, so I better not get that as an answer.

It is even worse than this (and a stronger demonstration of causality).
Suppose we had (and I'm making this up because it is an easy
demonstration):

    - logger starts up and projects 1000 objects.
    - inventory recognizes logger came back and instantiates an
      association.
    - power handling hears the association signal and queries mapper to
      get information on the corresponding logger entries.

In this case the paths...
    Logger -> Mapper -> Power Handling
           |
           -> Inventory -> Mapper -> Power Handling

give results as if the inventory association happens first, which breaks
the causality relationships.

> This exact same race is present any time an object is added
> or removed from dbus, so you're right, there isn't really a way to avoid
> it, aside from writing implementations that don't fail if the object count
> on a connection doesn't match what the mapper gave you, which is something
> we "learned" in bmcweb a long time ago.

Adding and removing isn't a causality breakage, so it isn't a problem.
You can't expect objects to be steady state and that is fine, but the
paths through dbus should preserve causality relationships.

> One workaround here to cover the startup case would be to simply batch
> together all objects for a given daemon in the InProgressIntrospect object
> such that all of a daemons dbus paths are added to the global object
> atomically, so we'd avoid the race in the startup case, but it would still
> exist when new objects are created, so it doesn't completely solve the
> problem, just solves it for the case above.

I don't know how this solves causality, unless mapper gives
InProgressIntrospect for every request after the signal comes in, until
introspection is complete.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: The incomplete result of mapper GetSubTree/Paths
  2022-05-23 16:19 ` Ed Tanous
@ 2022-05-24  2:58   ` Lei Yu
  0 siblings, 0 replies; 7+ messages in thread
From: Lei Yu @ 2022-05-24  2:58 UTC (permalink / raw)
  To: Ed Tanous; +Cc: openbmc

On Tue, May 24, 2022 at 12:19 AM Ed Tanous <edtanous@google.com> wrote:
>
>
>
> On Thu, May 19, 2022 at 8:14 PM Lei Yu <yulei.sh@bytedance.com> wrote:
>>
>> This email is to describe an issue in mapper that the
>> GetSubTree/GetSubTreePaths could return incomplete results when it's
>> doing introspect.
>>
>> Steps to reproduce the issue:
>> 1. Configure phosphor-logging to get max 1000 entries. (with
>> -Derror_info_cap=1000 meson option)
>> 2. Create 1000 logging entries.
>> 3. Call GetSubTreePaths and make sure it gets the correct 1000 entries:
>>     # busctl call "xyz.openbmc_project.ObjectMapper"
>> "/xyz/openbmc_project/object_mapper"
>> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
>> /xyz/openbmc_project/logging/entry 0 1
>> xyz.openbmc_project.Logging.Entry | awk '{print $2;}'
>>     1000
>> 4. Restart logging service
>>     # systemctl restart xyz.openbmc_project.Logging.service
>> 5. After the service is restarted, call GetSubTreePaths for multiple
>> times in the short time (e.g. within 10 seconds)
>>     # busctl call "xyz.openbmc_project.ObjectMapper"
>> "/xyz/openbmc_project/object_mapper"
>> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
>> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
>> '{print $2;}'
>>     47
>>     # busctl call "xyz.openbmc_project.ObjectMapper"
>> "/xyz/openbmc_project/object_mapper"
>> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
>> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
>> '{print $2;}'
>>     375
>>     # busctl call "xyz.openbmc_project.ObjectMapper"
>> "/xyz/openbmc_project/object_mapper"
>> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
>> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
>> '{print $2;}'
>>     851
>>     # busctl call "xyz.openbmc_project.ObjectMapper"
>> "/xyz/openbmc_project/object_mapper"
>> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
>> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
>> '{print $2;}'
>>     1000
>>
>> We can see that the result of GetSubTreePaths is increasing until it gets 1000.
>> This actually happens when mapper is doing introspect of the logging
>> service, and getting more and more objects.
>>
>> The above "incomplete" behavior will impact the logic that depends on
>> the result of GetSubTreePaths.
>> E.g. in ipmid, the "cached SEL" feature depends on the reliable result
>> of GetSubTreePath, to get the number of current logging entries.
>
>
> I'm not really following why this is a problem, and sounds like a bug in the cached SEL feature.  Dbus objects can get added or removed at any time, including logging objects, if the cached sel feature isn't handling all the added/removed signals as it should, that really should be fixed.  Can you go into more detail about what this actually causes?  Is it that logging objects are getting added out of order?
>

As Patrick also mentioned, the issue is that
1. When a service creates 1000 objects _before_ advertising the
service to the DBus.
2. When it advertises itself, the other services are expected to see
there are 1000 objects, _without_ `interfacesAdded` signal.
3. The reality is that mapper gives a random number between 0~1000
during the introspection.
4. The side effect is that if a service is interested and ask mapper
for the object, it gets incomplete result, and there are no
`interfacesAdded` signal at all, and thus it has the incomplete data
of the DBus objets.

>>
>> If
>> it's not correct, ipmid will not know the "missed" entries.
>
>
> What is a "missed" entry in this context?

See the above example, where the `GetSubTreePaths` API gives result of
47, 375, 851, while the expectation is 1000.

>>
>>
>> The question is, should we make sure mapper returns the "stable"
>> result in the above case?
>
>
> I don't think so, at least, when it was built, it wasn't designed to return atomically created results, but given that ANY object on dbus can be added or removed at any time, it's not clear why this is a problem in the mapper itself.

When an object is added or removed, either it emits
interfacesAdded/Removed signal, or it emits nameOwnerChanged signal.
Both will mapper to do the introspect and get the latest objects. And
when it is doing introspection, it is expected *NOT* to return
"incomplete" result. Otherwise services relying on mapper will have
chance to get incomplete view of the DBus objects.

>
>>
>> When it's doing introspect of a service (e.g. nameOwnerChanged), it
>> could throw if the service is not fully introspected, and only return
>> the "correct" result after the service is fully introspected.
>>
>>
>> If mapper could not guarantee the stable result, the service calling
>> mapper will have to add more complex logic to make sure it gets the
>> "full and correct" result.
>
>
> I left comments on your code review as well, but please don't do this.  This is going to force retry code into EVERY service that relies on the mapper, at EVERY mapper call site.  If the current behavior is truly a problem, I would much rather the mapper support multi-versioned atomically create copies of the dbus-map, and have the mapper expose the "reliable" interface, rather than spreading that throughout the system.  For context, bmcweb alone has dozens of call sites to the mapper that would need to be fixed if it were made to be possible for it to fail in this way.

The patch is POC to show the problem and a possible fix. Throwing
`ResourceNotFound` may not be a good fix.
But it shall tell the caller about the status of the result, possibly:
* Throwing an "InIntrospect" exception to indicate it's doing
introspecting on the object path. This is similar to what my patch
does, it's just using a more meaningful exception.
* Add a flag in the result to indicate that the result is not
complete. Then the caller who cares could check the flag, and retry
until it gets a complete result.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-05-24  2:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20  3:14 The incomplete result of mapper GetSubTree/Paths Lei Yu
2022-05-20 21:13 ` Patrick Williams
2022-05-23  5:41   ` [External] " Lei Yu
2022-05-23 16:26   ` Ed Tanous
2022-05-23 19:28     ` Patrick Williams
2022-05-23 16:19 ` Ed Tanous
2022-05-24  2:58   ` [External] " Lei Yu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.