openbmc.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Checking for network online
@ 2022-02-17 22:54 Johnathan Mantey
  2022-02-18  0:11 ` Jeremy Kerr
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Johnathan Mantey @ 2022-02-17 22:54 UTC (permalink / raw)
  To: OpenBMC Maillist


[-- Attachment #1.1: Type: text/plain, Size: 1696 bytes --]

/I/ntel has recently run into an issue regarding a systemd service, and 
we're interested in soliciting feedback from the community.

Issue: systemd-networkd-wait-online.service stalls for 120 seconds when 
the managed NICs do not have a network connection.

TLDR: Should OpenBMC remove systemd-networkd-wait-online.service 
universally?

System Config: All NICs in the system are not connected to an active 
network.

Test Process: The system under test (SUT) has AC removed, and some time 
later AC applied. Wait for BMC/BIOS to boot

Behavior: U-Boot hands control to the Linux boot process, and the 
systemd services are started. When systemd-networkd-wait-online.service 
starts it stalls waiting for the NICs to enter a fully functional state. 
This never happens during the default 120 second timeout period for this 
service. When the timeout elapses, an error message is logged to the 
journal reporting the service exited unsuccessfully.

Issues: This service blocks entry to multi-user.target.
phosphor-state-manager uses multi-user.target to report the BMC is ready 
to use.
This is reported via IPMI Get Device ID.
The Intel BIOS is blocked from booting until 
systemd-networkd-wait-online times out.
BMC entry to multi-user.target is delayed. Journal entries are created.

Question for the community: Given the negative side effects caused by 
running this service does the community want to have this service 
collectively removed from global build image?

-- 
Johnathan Mantey
Senior Software Engineer
*azad te**chnology partners*
Contributing to Technology Innovation since 1992
Phone: (503) 712-6764
Email: johnathanx.mantey@intel.com


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-17 22:54 Checking for network online Johnathan Mantey
@ 2022-02-18  0:11 ` Jeremy Kerr
  2022-02-18  2:29   ` Lei Yu
  2022-02-18 19:04 ` Doman, Jonathan
  2022-03-02  6:24 ` Ratan Gupta
  2 siblings, 1 reply; 18+ messages in thread
From: Jeremy Kerr @ 2022-02-18  0:11 UTC (permalink / raw)
  To: Johnathan Mantey, OpenBMC Maillist

Hi Johnathan,

> Issue: systemd-networkd-wait-online.service stalls for 120 seconds
> when the managed NICs do not have a network connection.
> 
> TLDR: Should OpenBMC remove systemd-networkd-wait-online.service 
> universally?

Probably not, it's required to implement network-online,target, which
is standard, and may be referred to by arbitrary packages.

There's some good background on the issues you're experiencing here:

 https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

in short: most services should be able to start before network-
online.target, and be able to adapt to changes in network configuration
after that point.

For your specific issue there:

> Issues: This service blocks entry to multi-user.target.
> phosphor-state-manager uses multi-user.target to report the BMC is
> ready to use.
> This is reported via IPMI Get Device ID.

That sounds like more of an issue of whether that reported state
actually represents the expected BMC state...

Regards,


Jeremy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-18  0:11 ` Jeremy Kerr
@ 2022-02-18  2:29   ` Lei Yu
  2022-02-18 16:11     ` Johnathan Mantey
  0 siblings, 1 reply; 18+ messages in thread
From: Lei Yu @ 2022-02-18  2:29 UTC (permalink / raw)
  To: Jeremy Kerr; +Cc: OpenBMC Maillist, Johnathan Mantey

On Fri, Feb 18, 2022 at 8:11 AM Jeremy Kerr <jk@codeconstruct.com.au> wrote:
>
> Hi Johnathan,
>
> > Issue: systemd-networkd-wait-online.service stalls for 120 seconds
> > when the managed NICs do not have a network connection.
> >
> > TLDR: Should OpenBMC remove systemd-networkd-wait-online.service
> > universally?
>
> Probably not, it's required to implement network-online,target, which
> is standard, and may be referred to by arbitrary packages.
>
> There's some good background on the issues you're experiencing here:
>
>  https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
>
> in short: most services should be able to start before network-
> online.target, and be able to adapt to changes in network configuration
> after that point.
>
> For your specific issue there:
>
> > Issues: This service blocks entry to multi-user.target.
> > phosphor-state-manager uses multi-user.target to report the BMC is
> > ready to use.
> > This is reported via IPMI Get Device ID.
>
> That sounds like more of an issue of whether that reported state
> actually represents the expected BMC state...

We have an internal "override" config to start
systemd-networkd-wait-online with --any option:

 # override.conf
 [Service]
 ExecStart=
 ExecStart=/lib/systemd/systemd-networkd-wait-online --any

This is mostly about fixing the QEMU CI.
In the real environment the network *should* be up and online so the
above makes the systemd-networkd-wait-online starts OK in both cases.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-18  2:29   ` Lei Yu
@ 2022-02-18 16:11     ` Johnathan Mantey
  2022-02-23  2:09       ` Jiaqing Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Johnathan Mantey @ 2022-02-18 16:11 UTC (permalink / raw)
  To: Lei Yu, Jeremy Kerr; +Cc: OpenBMC Maillist


[-- Attachment #1.1.1: Type: text/plain, Size: 2675 bytes --]

Reading the --any switch in the systemd-networkd-wait-online man page 
doesn't look like it would be helpful. That flag permits the service to 
move on when one of the NICs achieves 'online' functionality. In the 
case of a NIC w/o a cable connection 'online' never happens. Thus the 
default 120 second timeout is still going to elapse, BMC ready is going 
to be held off, BIOS is going to delay completion (in our BIOS), and an 
error message is still going to be logged.

It appears, based on comments so far, that my best way forward with the 
current implementation of wait-online is to assign "--timeout 
<number-smaller-than-120> -q" to reduce the amount of time for testing 
the NIC state, and to never log an error because the NIC was unplugged.

Gating on operational state, and issuing --ignore flags didn't work, 
leaving a large blunt instrument for a solution.

On 2/17/22 18:29, Lei Yu wrote:
> On Fri, Feb 18, 2022 at 8:11 AM Jeremy Kerr<jk@codeconstruct.com.au>  wrote:
>> Hi Johnathan,
>>
>>> Issue: systemd-networkd-wait-online.service stalls for 120 seconds
>>> when the managed NICs do not have a network connection.
>>>
>>> TLDR: Should OpenBMC remove systemd-networkd-wait-online.service
>>> universally?
>> Probably not, it's required to implement network-online,target, which
>> is standard, and may be referred to by arbitrary packages.
>>
>> There's some good background on the issues you're experiencing here:
>>
>>   https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
>>
>> in short: most services should be able to start before network-
>> online.target, and be able to adapt to changes in network configuration
>> after that point.
>>
>> For your specific issue there:
>>
>>> Issues: This service blocks entry to multi-user.target.
>>> phosphor-state-manager uses multi-user.target to report the BMC is
>>> ready to use.
>>> This is reported via IPMI Get Device ID.
>> That sounds like more of an issue of whether that reported state
>> actually represents the expected BMC state...
> We have an internal "override" config to start
> systemd-networkd-wait-online with --any option:
>
>   # override.conf
>   [Service]
>   ExecStart=
>   ExecStart=/lib/systemd/systemd-networkd-wait-online --any
>
> This is mostly about fixing the QEMU CI.
> In the real environment the network *should* be up and online so the
> above makes the systemd-networkd-wait-online starts OK in both cases.

-- 
Johnathan Mantey
Senior Software Engineer
*azad te**chnology partners*
Contributing to Technology Innovation since 1992
Phone: (503) 712-6764
Email: johnathanx.mantey@intel.com


[-- Attachment #1.1.2: Type: text/html, Size: 4368 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-17 22:54 Checking for network online Johnathan Mantey
  2022-02-18  0:11 ` Jeremy Kerr
@ 2022-02-18 19:04 ` Doman, Jonathan
  2022-02-18 19:39   ` Johnathan Mantey
  2022-03-02  6:24 ` Ratan Gupta
  2 siblings, 1 reply; 18+ messages in thread
From: Doman, Jonathan @ 2022-02-18 19:04 UTC (permalink / raw)
  To: openbmc, Mantey, JohnathanX

On Thu, 2022-02-17 at 14:54 -0800, Johnathan Mantey wrote:
> /I/ntel has recently run into an issue regarding a systemd service, and 
> we're interested in soliciting feedback from the community.
> 
> Issue: systemd-networkd-wait-online.service stalls for 120 seconds when 
> the managed NICs do not have a network connection.
> 
> TLDR: Should OpenBMC remove systemd-networkd-wait-online.service 
> universally?
> 
> System Config: All NICs in the system are not connected to an active 
> network.
> 
> Test Process: The system under test (SUT) has AC removed, and some time 
> later AC applied. Wait for BMC/BIOS to boot
> 
> Behavior: U-Boot hands control to the Linux boot process, and the 
> systemd services are started. When systemd-networkd-wait-online.service 
> starts it stalls waiting for the NICs to enter a fully functional state. 
> This never happens during the default 120 second timeout period for this 
> service. When the timeout elapses, an error message is logged to the 
> journal reporting the service exited unsuccessfully.
> 
> Issues: This service blocks entry to multi-user.target.
> phosphor-state-manager uses multi-user.target to report the BMC is ready 
> to use.
> This is reported via IPMI Get Device ID.
> The Intel BIOS is blocked from booting until 
> systemd-networkd-wait-online times out.
> BMC entry to multi-user.target is delayed. Journal entries are created.
> 
> Question for the community: Given the negative side effects caused by 
> running this service does the community want to have this service 
> collectively removed from global build image?

I think the initial discussion in #general got to the root of the
issue: multi-user.target Wants rsyslog.service, which in turn is
ordered After network-online.target. rsyslog seems to be the only thing
tying multi-user to network-online.

Did you try removing the Wants/After=network-online.target from
rsyslog.service to see if the situation improves? If it does, then we
can discuss removing that dependency or making it configurable.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-18 19:04 ` Doman, Jonathan
@ 2022-02-18 19:39   ` Johnathan Mantey
  2022-02-23 13:58     ` Patrick Williams
  0 siblings, 1 reply; 18+ messages in thread
From: Johnathan Mantey @ 2022-02-18 19:39 UTC (permalink / raw)
  To: Doman, Jonathan, openbmc


[-- Attachment #1.1: Type: text/plain, Size: 3137 bytes --]



On 2/18/22 11:04, Doman, Jonathan wrote:
> On Thu, 2022-02-17 at 14:54 -0800, Johnathan Mantey wrote:
>> /I/ntel has recently run into an issue regarding a systemd service, and
>> we're interested in soliciting feedback from the community.
>>
>> Issue: systemd-networkd-wait-online.service stalls for 120 seconds when
>> the managed NICs do not have a network connection.
>>
>> TLDR: Should OpenBMC remove systemd-networkd-wait-online.service
>> universally?
>>
>> System Config: All NICs in the system are not connected to an active
>> network.
>>
>> Test Process: The system under test (SUT) has AC removed, and some time
>> later AC applied. Wait for BMC/BIOS to boot
>>
>> Behavior: U-Boot hands control to the Linux boot process, and the
>> systemd services are started. When systemd-networkd-wait-online.service
>> starts it stalls waiting for the NICs to enter a fully functional state.
>> This never happens during the default 120 second timeout period for this
>> service. When the timeout elapses, an error message is logged to the
>> journal reporting the service exited unsuccessfully.
>>
>> Issues: This service blocks entry to multi-user.target.
>> phosphor-state-manager uses multi-user.target to report the BMC is ready
>> to use.
>> This is reported via IPMI Get Device ID.
>> The Intel BIOS is blocked from booting until
>> systemd-networkd-wait-online times out.
>> BMC entry to multi-user.target is delayed. Journal entries are created.
>>
>> Question for the community: Given the negative side effects caused by
>> running this service does the community want to have this service
>> collectively removed from global build image?
> 
> I think the initial discussion in #general got to the root of the
> issue: multi-user.target Wants rsyslog.service, which in turn is
> ordered After network-online.target. rsyslog seems to be the only thing
> tying multi-user to network-online.

I assume you mean OpenBMC Discord #general channel?

> 
> Did you try removing the Wants/After=network-online.target from
> rsyslog.service to see if the situation improves? If it does, then we
> can discuss removing that dependency or making it configurable.

No, I had not tried that. My take on doing so is that it'll be like 
playing whack a mole. Some other service may decide to rely on 
systemd-networkd-wait-online. The issue is now compounded as a result.

I basically took it on faith that rsyslog needed this service. I did not 
investigate what issues arise in rsyslog when no network is present.

Ultimately there will be times where the BMC will have to operate sans 
network. It's unfortunate that the wait-online service doesn't seem to 
perform the expected operation for the --ignore and the min/max 
operational state functions. This may be a mismatch between my 
expectations and the actual implementation of wait-online.

-- 
Johnathan Mantey
Senior Software Engineer
*azad te**chnology partners*
Contributing to Technology Innovation since 1992
Phone: (503) 712-6764
Email: johnathanx.mantey@intel.com <mailto:johnathanx.mantey@intel.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-18 16:11     ` Johnathan Mantey
@ 2022-02-23  2:09       ` Jiaqing Zhao
  2022-02-23 13:48         ` Patrick Williams
  0 siblings, 1 reply; 18+ messages in thread
From: Jiaqing Zhao @ 2022-02-23  2:09 UTC (permalink / raw)
  To: Johnathan Mantey, Lei Yu, Jeremy Kerr; +Cc: OpenBMC Maillist

I think a solution is to set RequiredForOnline=no (https://www.freedesktop.org/software/systemd/man/systemd.network.html#RequiredForOnline=) in all network interface config. This option skips the interface when running systemd-networkd-wait-online.service. Canonical netplan (used in ubuntu server) also uses this option to skip the online check for given interface (https://github.com/canonical/netplan/blob/main/src/networkd.c#L636-L639).

I'll submit a patch to phosphor-networkd later.

On 2022-02-19 00:11, Johnathan Mantey wrote:
> Reading the --any switch in the systemd-networkd-wait-online man page doesn't look like it would be helpful. That flag permits the service to move on when one of the NICs achieves 'online' functionality. In the case of a NIC w/o a cable connection 'online' never happens. Thus the default 120 second timeout is still going to elapse, BMC ready is going to be held off, BIOS is going to delay completion (in our BIOS), and an error message is still going to be logged.
> 
> It appears, based on comments so far, that my best way forward with the current implementation of wait-online is to assign "--timeout <number-smaller-than-120> -q" to reduce the amount of time for testing the NIC state, and to never log an error because the NIC was unplugged.
> 
> Gating on operational state, and issuing --ignore flags didn't work, leaving a large blunt instrument for a solution.
> 
> On 2/17/22 18:29, Lei Yu wrote:
>> On Fri, Feb 18, 2022 at 8:11 AM Jeremy Kerr<jk@codeconstruct.com.au>  wrote:
>>> Hi Johnathan,
>>>
>>>> Issue: systemd-networkd-wait-online.service stalls for 120 seconds
>>>> when the managed NICs do not have a network connection.
>>>>
>>>> TLDR: Should OpenBMC remove systemd-networkd-wait-online.service
>>>> universally?
>>> Probably not, it's required to implement network-online,target, which
>>> is standard, and may be referred to by arbitrary packages.
>>>
>>> There's some good background on the issues you're experiencing here:
>>>
>>>   https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
>>>
>>> in short: most services should be able to start before network-
>>> online.target, and be able to adapt to changes in network configuration
>>> after that point.
>>>
>>> For your specific issue there:
>>>
>>>> Issues: This service blocks entry to multi-user.target.
>>>> phosphor-state-manager uses multi-user.target to report the BMC is
>>>> ready to use.
>>>> This is reported via IPMI Get Device ID.
>>> That sounds like more of an issue of whether that reported state
>>> actually represents the expected BMC state...
>> We have an internal "override" config to start
>> systemd-networkd-wait-online with --any option:
>>
>>   # override.conf
>>   [Service]
>>   ExecStart=
>>   ExecStart=/lib/systemd/systemd-networkd-wait-online --any
>>
>> This is mostly about fixing the QEMU CI.
>> In the real environment the network *should* be up and online so the
>> above makes the systemd-networkd-wait-online starts OK in both cases.
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-23  2:09       ` Jiaqing Zhao
@ 2022-02-23 13:48         ` Patrick Williams
  2022-02-23 17:44           ` Jiaqing Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick Williams @ 2022-02-23 13:48 UTC (permalink / raw)
  To: Jiaqing Zhao; +Cc: Jeremy Kerr, OpenBMC Maillist, Lei Yu, Johnathan Mantey

[-- Attachment #1: Type: text/plain, Size: 2223 bytes --]

On Wed, Feb 23, 2022 at 10:09:19AM +0800, Jiaqing Zhao wrote:
> I think a solution is to set RequiredForOnline=no (https://www.freedesktop.org/software/systemd/man/systemd.network.html#RequiredForOnline=) in all network interface config. This option skips the interface when running systemd-networkd-wait-online.service. Canonical netplan (used in ubuntu server) also uses this option to skip the online check for given interface (https://github.com/canonical/netplan/blob/main/src/networkd.c#L636-L639).
> 
> I'll submit a patch to phosphor-networkd later.

I really don't think this is appropriate for all systems.  Services have
dependencies on network-online.target for a reason.  If the side-effect of
having the BMC network cable unplugged is that the host doesn't boot, that might
be entirely reasonable behavior in some environments.

We use rsyslog as the mechanism to offload our BMC logging data to an
aggregation point.  When you have a very large scale deployment, it is actually
better for the system to not come online than for us to lose out on that data,
since we have spare capacity to take its place.

Note that the Canonical netplan only applies this option if the configuration
indicates that the interface is optional, which is entirely appropriate.  The
way you wrote it could have been interpreted that they set this on *every*
interface, which is what it seems like you're proposing to do to
phosphor-networkd.

If this is desired behavior for someone, can't you supply a wildcard .network
file that adds this option, rather than modifying phosphor-networkd to manually
add it to each network interface that it is managing?

I believe some designs use a USB network device to connect two internal pieces
of the system and those interfaces are not necessarily managed by
phosphor-networkd (interfaces that, for example connect BMC-to-BMC or
BMC-to-Host).  While it is obviously up to the system designer to work through
this bug, by applying this configuration as you proposed you are causing
unusual default behavior in that networkd is going to start waiting for these
internal connections to come online instead of the external interface.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-18 19:39   ` Johnathan Mantey
@ 2022-02-23 13:58     ` Patrick Williams
  0 siblings, 0 replies; 18+ messages in thread
From: Patrick Williams @ 2022-02-23 13:58 UTC (permalink / raw)
  To: Johnathan Mantey; +Cc: Doman, Jonathan, openbmc

[-- Attachment #1: Type: text/plain, Size: 2278 bytes --]

On Fri, Feb 18, 2022 at 11:39:03AM -0800, Johnathan Mantey wrote:
> On 2/18/22 11:04, Doman, Jonathan wrote:
> > On Thu, 2022-02-17 at 14:54 -0800, Johnathan Mantey wrote:

> > 
> > Did you try removing the Wants/After=network-online.target from
> > rsyslog.service to see if the situation improves? If it does, then we
> > can discuss removing that dependency or making it configurable.
> 
> No, I had not tried that. My take on doing so is that it'll be like 
> playing whack a mole. Some other service may decide to rely on 
> systemd-networkd-wait-online. The issue is now compounded as a result.

I don't understand why this is a particularly difficult problem.  You asked me
on Discord how I figured this out and I said "grep".  You literally just look
for a service that depends on `network-online.target`.

No service should depend on systemd-networkd-wait-online directly.  If they do,
this is a bug.  They are always depending on network-online.target.

Services depend on network-online.target for valid reasons.  As far as I can
tell, currently, only rsyslog* has this dependency (at least on Witherspoon and
Bletchley).  If a new service adds this dependency, and you have an issue with
it, I think you should take some time to reason about why this dependency was
added rather than simply ignoring it (by a force-disable on
systemd-networkd-wait-online).

"Fixing" the rsyslog dependency in your system is a pretty trivial
`do_install:append` with a `sed` in it to strip out the line.  Catching new
dependencies is a fairly simple `ROOTFS_POSTPROCESS_COMMAND` to check all the
service files in `/lib/systemd/system` for a `network-online.target` dependency.

* As I mentioned in Discord, rsyslog having this dependency by default makes
  good sense from the upstream's perspective.  rsyslog is _typically_
  configured for remote offload of the syslog (hence, r in rsyslog).  It just
  happens that we have this unusual Rube Goldberg transformation using rsyslog
  for formulating the SEL and Redfish log files.  For that transformation the
  dependency isn't necessary, but for people who use rsyslog as it is intended
  they probably want the dependency (as I've mentioned in another email we do).

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-23 13:48         ` Patrick Williams
@ 2022-02-23 17:44           ` Jiaqing Zhao
  2022-02-23 18:36             ` Bills, Jason M
                               ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Jiaqing Zhao @ 2022-02-23 17:44 UTC (permalink / raw)
  To: Patrick Williams; +Cc: Jeremy Kerr, OpenBMC Maillist, Lei Yu, Johnathan Mantey

On 2022-02-23 21:48, Patrick Williams wrote:
> On Wed, Feb 23, 2022 at 10:09:19AM +0800, Jiaqing Zhao wrote:
>> I think a solution is to set RequiredForOnline=no (https://www.freedesktop.org/software/systemd/man/systemd.network.html#RequiredForOnline=) in all network interface config. This option skips the interface when running systemd-networkd-wait-online.service. Canonical netplan (used in ubuntu server) also uses this option to skip the online check for given interface (https://github.com/canonical/netplan/blob/main/src/networkd.c#L636-L639).
>>
>> I'll submit a patch to phosphor-networkd later.
> 
> I really don't think this is appropriate for all systems.  Services have
> dependencies on network-online.target for a reason.  If the side-effect of
> having the BMC network cable unplugged is that the host doesn't boot, that might
> be entirely reasonable behavior in some environments.
> 
> We use rsyslog as the mechanism to offload our BMC logging data to an
> aggregation point.  When you have a very large scale deployment, it is actually
> better for the system to not come online than for us to lose out on that data,
> since we have spare capacity to take its place.

My understanding is that in OpenBMC, the propose to use rsyslog is to format the Redfish and IPMI SEL logs from system journal. The "r" of rsyslogd is not used in most cases. I think the "network not available" can be handled same as "server misconfigured" in rsyslogd, as in both cases it fails to connect to the server, and may exit or print some error messages? (not tried yet)

Jonathan mentions that the 120s wait blocks multi-user.target in his initial email. Considering that there is no BMC serial port in most production hardware, when BMC has no network connection, the only way to interact with BMC is to use IPMI in host. However, IPMI services are started in multi-user.target, if BMC infinitely waits network online, there would be no way to debug the issue. 

> Note that the Canonical netplan only applies this option if the configuration
> indicates that the interface is optional, which is entirely appropriate.  The
> way you wrote it could have been interpreted that they set this on *every*
> interface, which is what it seems like you're proposing to do to
> phosphor-networkd
>
> If this is desired behavior for someone, can't you supply a wildcard .network
> file that adds this option, rather than modifying phosphor-networkd to manually
> add it to each network interface that it is managing?

Maybe we can add a similar DBus property like how netplan does? Reading/writing systemd-networkd config files is feasible in phosphor-networkd. Default value can be assigned via build option.
 
> I believe some designs use a USB network device to connect two internal pieces
> of the system and those interfaces are not necessarily managed by
> phosphor-networkd (interfaces that, for example connect BMC-to-BMC or
> BMC-to-Host).  While it is obviously up to the system designer to work through
> this bug, by applying this configuration as you proposed you are causing
> unusual default behavior in that networkd is going to start waiting for these
> internal connections to come online instead of the external interface.

I think this is a extremely rare case, internal interfaces should be configurable. For example, host OS can change the IP of its BMC-Host virtual interface, BMC should also be able to change its, and for BMC-to-BMC interfaces, it is impossible to assign a fixed LAN IP without conflicts in manufacturing. The easiest way to configure it is to utilize the phosphor-networkd.

Even it is not managed by phosphor-networkd, keeping default RequiredForOnline=yes will cause the 120s wait on BMC boot. Developers can simply search it and find out the solution. I remember it will show a timer with message on BMC serial console, that's how I found I should set the "optional" on my ubuntu server.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-23 17:44           ` Jiaqing Zhao
@ 2022-02-23 18:36             ` Bills, Jason M
  2022-02-23 18:58               ` Patrick Williams
  2022-02-23 18:55             ` Patrick Williams
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Bills, Jason M @ 2022-02-23 18:36 UTC (permalink / raw)
  To: openbmc



On 2/23/2022 10:44 AM, Jiaqing Zhao wrote:
> On 2022-02-23 21:48, Patrick Williams wrote:
>> On Wed, Feb 23, 2022 at 10:09:19AM +0800, Jiaqing Zhao wrote:
<snip>...
> 
> My understanding is that in OpenBMC, the propose to use rsyslog is to format the Redfish and IPMI SEL logs from system journal. The "r" of rsyslogd is not used in most cases.
> 

Just a nit-picky side-note: The "r" in rsyslogd is for "rocket-fast".  I 
don't believe rsyslogd is inherently designed for remote logging.
https://www.rsyslog.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-23 17:44           ` Jiaqing Zhao
  2022-02-23 18:36             ` Bills, Jason M
@ 2022-02-23 18:55             ` Patrick Williams
  2022-02-23 20:04             ` Johnathan Mantey
  2022-03-01 19:56             ` Milton Miller II
  3 siblings, 0 replies; 18+ messages in thread
From: Patrick Williams @ 2022-02-23 18:55 UTC (permalink / raw)
  To: Jiaqing Zhao; +Cc: Jeremy Kerr, OpenBMC Maillist, Lei Yu, Johnathan Mantey

[-- Attachment #1: Type: text/plain, Size: 6143 bytes --]

On Thu, Feb 24, 2022 at 01:44:18AM +0800, Jiaqing Zhao wrote:
> On 2022-02-23 21:48, Patrick Williams wrote:
> > On Wed, Feb 23, 2022 at 10:09:19AM +0800, Jiaqing Zhao wrote:
> >> I think a solution is to set RequiredForOnline=no (https://www.freedesktop.org/software/systemd/man/systemd.network.html#RequiredForOnline=) in all network interface config. This option skips the interface when running systemd-networkd-wait-online.service. Canonical netplan (used in ubuntu server) also uses this option to skip the online check for given interface (https://github.com/canonical/netplan/blob/main/src/networkd.c#L636-L639).
> >>
> >> I'll submit a patch to phosphor-networkd later.
> > 
> > I really don't think this is appropriate for all systems.  Services have
> > dependencies on network-online.target for a reason.  If the side-effect of
> > having the BMC network cable unplugged is that the host doesn't boot, that might
> > be entirely reasonable behavior in some environments.
> > 
> > We use rsyslog as the mechanism to offload our BMC logging data to an
> > aggregation point.  When you have a very large scale deployment, it is actually
> > better for the system to not come online than for us to lose out on that data,
> > since we have spare capacity to take its place.
> 
> My understanding is that in OpenBMC, the propose to use rsyslog is to format the Redfish and IPMI SEL logs from system journal. The "r" of rsyslogd is not used in most cases. 

I might have left some ambiguity in 'we' in this context.  I meant 'the
deployments I am working on'.  I believe at least one other company leverages
this as well.

> I think the "network not available" can be handled same as "server misconfigured" in rsyslogd, as in both cases it fails to connect to the server, and may exit or print some error messages? (not tried yet)

That is probably true, but it means that I can't offload any data about the
system in the meantime.  Like I said, I'd rather leave the system out of my
deployment if it is degraded.

> 
> Jonathan mentions that the 120s wait blocks multi-user.target in his initial email. Considering that there is no BMC serial port in most production hardware, when BMC has no network connection, the only way to interact with BMC is to use IPMI in host.

Your assertion "no BMC serial port in most production hardware" might be true
globally speaking.  It isn't necessarily true for any particular deployment.

With the 120s wait time, is rsyslog actually running after that? Or is it
failed?  I guess since it has a Wants and not a Requires on network-online,
it'll still start up after the 120s timeout of systemd-networkd-wait-online.

My understanding of systemd-networkd's defaults here is that it waits for DHCP
in order for network-online.target to pass.  You can have the IPv6-LL address
configured still, which can allow remote access, even if the IPv6-DHCP address
is not assigned.

> However, IPMI services are started in multi-user.target, if BMC infinitely waits network online, there would be no way to debug the issue. 

Sure, but the BMC doesn't wait forever, does it?  It just waits 120s.

I'm not suggesting this isn't the right solution for your systems, or even that
it might not be the right solution for most systems, but I don't think it is the
right solution for _all_ systems so we need to ensure it can be opt-out.

> 
> > Note that the Canonical netplan only applies this option if the configuration
> > indicates that the interface is optional, which is entirely appropriate.  The
> > way you wrote it could have been interpreted that they set this on *every*
> > interface, which is what it seems like you're proposing to do to
> > phosphor-networkd
> >
> > If this is desired behavior for someone, can't you supply a wildcard .network
> > file that adds this option, rather than modifying phosphor-networkd to manually
> > add it to each network interface that it is managing?
> 
> Maybe we can add a similar DBus property like how netplan does? Reading/writing systemd-networkd config files is feasible in phosphor-networkd. Default value can be assigned via build option.

I'm not sure if it belongs as a DBus property or not.  I'd have to see what
you're proposing and think about it.  I think this is a system design constraint
and not really configurable by users (hence why exposing a DBus property might
not make sense) but maybe I'm wrong on this.

> > I believe some designs use a USB network device to connect two internal pieces
> > of the system and those interfaces are not necessarily managed by
> > phosphor-networkd (interfaces that, for example connect BMC-to-BMC or
> > BMC-to-Host).  While it is obviously up to the system designer to work through
> > this bug, by applying this configuration as you proposed you are causing
> > unusual default behavior in that networkd is going to start waiting for these
> > internal connections to come online instead of the external interface.
> 
> I think this is a extremely rare case, internal interfaces should be configurable. For example, host OS can change the IP of its BMC-Host virtual interface, BMC should also be able to change its, and for BMC-to-BMC interfaces, it is impossible to assign a fixed LAN IP without conflicts in manufacturing. 

I don't follow your concern here.  We can (and do) easily assign a static IP
address for the BMC-to-BMC interfaces based on position information fed into the
BMC via GPIO signals.

> The easiest way to configure it is to utilize the phosphor-networkd.
> 
> Even it is not managed by phosphor-networkd, keeping default RequiredForOnline=yes will cause the 120s wait on BMC boot. Developers can simply search it and find out the solution. I remember it will show a timer with message on BMC serial console, that's how I found I should set the "optional" on my ubuntu server.

Agreed.  Someone _can_ find it and debug it.  It is to me not an obvious or easy
thing to work out though because automated "network down" test cases are not
often done in my experience.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-23 18:36             ` Bills, Jason M
@ 2022-02-23 18:58               ` Patrick Williams
  0 siblings, 0 replies; 18+ messages in thread
From: Patrick Williams @ 2022-02-23 18:58 UTC (permalink / raw)
  To: Bills, Jason M; +Cc: openbmc

[-- Attachment #1: Type: text/plain, Size: 462 bytes --]

On Wed, Feb 23, 2022 at 11:36:50AM -0700, Bills, Jason M wrote:
> > My understanding is that in OpenBMC, the propose to use rsyslog is to format the Redfish and IPMI SEL logs from system journal. The "r" of rsyslogd is not used in most cases.
> > 
> 
> Just a nit-picky side-note: The "r" in rsyslogd is for "rocket-fast".  I 
> don't believe rsyslogd is inherently designed for remote logging.
> https://www.rsyslog.com/

#TIL

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-23 17:44           ` Jiaqing Zhao
  2022-02-23 18:36             ` Bills, Jason M
  2022-02-23 18:55             ` Patrick Williams
@ 2022-02-23 20:04             ` Johnathan Mantey
  2022-02-24 20:09               ` Patrick Williams
  2022-03-01 19:56             ` Milton Miller II
  3 siblings, 1 reply; 18+ messages in thread
From: Johnathan Mantey @ 2022-02-23 20:04 UTC (permalink / raw)
  To: Jiaqing Zhao, Patrick Williams; +Cc: Jeremy Kerr, OpenBMC Maillist, Lei Yu


[-- Attachment #1.1: Type: text/plain, Size: 6238 bytes --]



On 2/23/22 09:44, Jiaqing Zhao wrote:
> On 2022-02-23 21:48, Patrick Williams wrote:
>> On Wed, Feb 23, 2022 at 10:09:19AM +0800, Jiaqing Zhao wrote:
>>> I think a solution is to set RequiredForOnline=no (https://www.freedesktop.org/software/systemd/man/systemd.network.html#RequiredForOnline=) in all network interface config. This option skips the interface when running systemd-networkd-wait-online.service. Canonical netplan (used in ubuntu server) also uses this option to skip the online check for given interface (https://github.com/canonical/netplan/blob/main/src/networkd.c#L636-L639).
>>>
>>> I'll submit a patch to phosphor-networkd later.
>>
>> I really don't think this is appropriate for all systems.  Services have
>> dependencies on network-online.target for a reason.  If the side-effect of
>> having the BMC network cable unplugged is that the host doesn't boot, that might
>> be entirely reasonable behavior in some environments.
>>
>> We use rsyslog as the mechanism to offload our BMC logging data to an
>> aggregation point.  When you have a very large scale deployment, it is actually
>> better for the system to not come online than for us to lose out on that data,
>> since we have spare capacity to take its place.
> 
> My understanding is that in OpenBMC, the propose to use rsyslog is to format the Redfish and IPMI SEL logs from system journal. The "r" of rsyslogd is not used in most cases. I think the "network not available" can be handled same as "server misconfigured" in rsyslogd, as in both cases it fails to connect to the server, and may exit or print some error messages? (not tried yet)
> 
> Jonathan mentions that the 120s wait blocks multi-user.target in his initial email. Considering that there is no BMC serial port in most production hardware, when BMC has no network connection, the only way to interact with BMC is to use IPMI in host. However, IPMI services are started in multi-user.target, if BMC infinitely waits network online, there would be no way to debug the issue.
> 
>> Note that the Canonical netplan only applies this option if the configuration
>> indicates that the interface is optional, which is entirely appropriate.  The
>> way you wrote it could have been interpreted that they set this on *every*
>> interface, which is what it seems like you're proposing to do to
>> phosphor-networkd
>>
>> If this is desired behavior for someone, can't you supply a wildcard .network
>> file that adds this option, rather than modifying phosphor-networkd to manually
>> add it to each network interface that it is managing?
> 
> Maybe we can add a similar DBus property like how netplan does? Reading/writing systemd-networkd config files is feasible in phosphor-networkd. Default value can be assigned via build option.
>   
>> I believe some designs use a USB network device to connect two internal pieces
>> of the system and those interfaces are not necessarily managed by
>> phosphor-networkd (interfaces that, for example connect BMC-to-BMC or
>> BMC-to-Host).  While it is obviously up to the system designer to work through
>> this bug, by applying this configuration as you proposed you are causing
>> unusual default behavior in that networkd is going to start waiting for these
>> internal connections to come online instead of the external interface.
> 
> I think this is a extremely rare case, internal interfaces should be configurable. For example, host OS can change the IP of its BMC-Host virtual interface, BMC should also be able to change its, and for BMC-to-BMC interfaces, it is impossible to assign a fixed LAN IP without conflicts in manufacturing. The easiest way to configure it is to utilize the phosphor-networkd.
> 
> Even it is not managed by phosphor-networkd, keeping default RequiredForOnline=yes will cause the 120s wait on BMC boot. Developers can simply search it and find out the solution. I remember it will show a timer with message on BMC serial console, that's how I found I should set the "optional" on my ubuntu server.

FWIW, my experimentation with systemd-networkd-wait-online was not 
successful in doing much to change the 120 second timeout.

Setting the RequiredForOnline entry to false in systemd.network did not 
prevent the 120 second timeout from elapsing.

Setting any of the following switches in the service file failed to 
eliminate the timeout:
--ignore=eth0
--interface=eth0:no-carrier            # overrides RequiredForOnline
--interface=eth0:no-carrier:no-carrier # <- probably a bad setting in
                                        # hindsight

It appears systemd-networkd-wait-online expects some state greater than 
no-carrier to consider the link online, thus allowing it to exit with a 
SUCCESS error code. This even when explicitly instructed no-carrier is 
defined as "online".

The only switch that seemed to perform as expected in this instance was 
--timeout. Assigning a value less than 120 to the --timeout control did 
reduce the wait period. It does assign a SUCCESS error code upon timing 
out, which is expected behavior.

systemd-networkd-wait-online appears to have logic preventing no-carrier 
state from being assigned as the "network online" value.

rsyslogd has both a network and network-online target. If the 
network-online target is removed then systemd-networkd-wait-online 
doesn't run, and any configuation of that service appears to be 
pointless. The conclusion I have from that is that network-online.target 
is a valid configuration option for a service to assign.

There may be openbmc powered servers that do use the distributed logging 
provided by rsyslogd. If there are then globally removing network-online 
from the rsyslog service file is undesirable. I consider the same to be 
true of assigning a default RequiredForOnline=false.

Based on the above, it's my opinion this is a vendor based decision for 
how to configure rsyslog/systemd-networkd-wait-online.

-- 
Johnathan Mantey
Senior Software Engineer
*azad te**chnology partners*
Contributing to Technology Innovation since 1992
Phone: (503) 712-6764
Email: johnathanx.mantey@intel.com <mailto:johnathanx.mantey@intel.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-23 20:04             ` Johnathan Mantey
@ 2022-02-24 20:09               ` Patrick Williams
  2022-03-02  6:15                 ` Jiaqing Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick Williams @ 2022-02-24 20:09 UTC (permalink / raw)
  To: Johnathan Mantey; +Cc: Jiaqing Zhao, Jeremy Kerr, OpenBMC Maillist, Lei Yu

[-- Attachment #1: Type: text/plain, Size: 975 bytes --]

On Wed, Feb 23, 2022 at 12:04:12PM -0800, Johnathan Mantey wrote:
> On 2/23/22 09:44, Jiaqing Zhao wrote:
> > On 2022-02-23 21:48, Patrick Williams wrote:
> >> On Wed, Feb 23, 2022 at 10:09:19AM +0800, Jiaqing Zhao wrote:

> There may be openbmc powered servers that do use the distributed logging 
> provided by rsyslogd. If there are then globally removing network-online 
> from the rsyslog service file is undesirable. I consider the same to be 
> true of assigning a default RequiredForOnline=false.
> 
> Based on the above, it's my opinion this is a vendor based decision for 
> how to configure rsyslog/systemd-networkd-wait-online.

I agree we shouldn't enable this globally, but that doesn't mean we can't add
a simple PKGCONFIG that allows it to be enabled/disabled as needed.  That way
we only have the single `PKGCONFIG:append` line in vendor layers and vendors
that have a problem with it can leave it same as upstream.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Checking for network online
  2022-02-23 17:44           ` Jiaqing Zhao
                               ` (2 preceding siblings ...)
  2022-02-23 20:04             ` Johnathan Mantey
@ 2022-03-01 19:56             ` Milton Miller II
  3 siblings, 0 replies; 18+ messages in thread
From: Milton Miller II @ 2022-03-01 19:56 UTC (permalink / raw)
  To: Johnathan Mantey; +Cc: Jiaqing Zhao, OpenBMC Maillist, Lei Yu, Jeremy Kerr

On Feb 23, 2022,  Johnathan Mantey wrote:
>On 2/23/22 09:44, Jiaqing Zhao wrote:
>> On 2022-02-23 21:48, Patrick Williams wrote:
>>> On Wed, Feb 23, 2022 at 10:09:19AM +0800, Jiaqing Zhao wrote:
>>>> I think a solution is to set RequiredForOnline=no
>>>> (https://www.freedesktop.org/software/systemd/man/systemd.network.htm
>>>> l#RequiredForOnline=) in all network interface config. This option
>>>> skips the interface when running
>>>> systemd-networkd-wait-online.service. Canonical netplan (used in
>>>> ubuntu server) also uses this option to skip the online check for
>>>> given interface
>>>> (https://github.com/canonical/netplan/blob/main/src/networkd.c#L636-L
>>>> 639).
>>>>
>>>> I'll submit a patch to phosphor-networkd later.
>>>
>>> I really don't think this is appropriate for all systems.
>>> Services have
>>> dependencies on network-online.target for a reason. If the
>>> side-effect of
>>> having the BMC network cable unplugged is that the host doesn't
>>> boot, that might
>>> be entirely reasonable behavior in some environments.
>>>
>>> We use rsyslog as the mechanism to offload our BMC logging data to
>>> an
>>> aggregation point. When you have a very large scale deployment,
>>> it is actually
>>> better for the system to not come online than for us to lose out
>>> on that data,
>>> since we have spare capacity to take its place.
>>
>> My understanding is that in OpenBMC, the propose to use rsyslog is
>> to format the Redfish and IPMI SEL logs from system journal. The "r"
>> of rsyslogd is not used in most cases. I think the "network not
>> available" can be handled same as "server misconfigured" in rsyslogd,
>> as in both cases it fails to connect to the server, and may exit or
>> print some error messages? (not tried yet)
>>
>> Jonathan mentions that the 120s wait blocks multi-user.target in
>> his initial email. Considering that there is no BMC serial port in
>> most production hardware, when BMC has no network connection, the
>> only way to interact with BMC is to use IPMI in host. However, IPMI
>> services are started in multi-user.target, if BMC infinitely waits
>> network online, there would be no way to debug the issue.
>>
>>> Note that the Canonical netplan only applies this option if the
>>> configuration
>>> indicates that the interface is optional, which is entirely
>>> appropriate. The
>>> way you wrote it could have been interpreted that they set this on
>>> *every*
>>> interface, which is what it seems like you're proposing to do to
>>> phosphor-networkd
>>>
>>> If this is desired behavior for someone, can't you supply a
>>> wildcard .network
>>> file that adds this option, rather than modifying
>>> phosphor-networkd to manually
>>> add it to each network interface that it is managing?
>>
>> Maybe we can add a similar DBus property like how netplan does?
>> Reading/writing systemd-networkd config files is feasible in
>> phosphor-networkd. Default value can be assigned via build option.
>>
>>> I believe some designs use a USB network device to connect two
>>> internal pieces
>>> of the system and those interfaces are not necessarily managed by
>>> phosphor-networkd (interfaces that, for example connect BMC-to-BMC
>>> or
>>> BMC-to-Host). While it is obviously up to the system designer to
>>> work through
>>> this bug, by applying this configuration as you proposed you are
>>> causing
>>> unusual default behavior in that networkd is going to start
>>> waiting for these
>>> internal connections to come online instead of the external
>>> interface.
>>
>> I think this is a extremely rare case, internal interfaces should
>> be configurable. For example, host OS can change the IP of its
>> BMC-Host virtual interface, BMC should also be able to change its,
>> and for BMC-to-BMC interfaces, it is impossible to assign a fixed LAN
>> IP without conflicts in manufacturing. The easiest way to configure
>> it is to utilize the phosphor-networkd.
>>
>> Even it is not managed by phosphor-networkd, keeping default
>> RequiredForOnline=yes will cause the 120s wait on BMC boot.
>> Developers can simply search it and find out the solution. I remember
>> it will show a timer with message on BMC serial console, that's how I
>>found I should set the "optional" on my ubuntu server.
>
>FWIW, my experimentation with systemd-networkd-wait-online was not
>successful in doing much to change the 120 second timeout.
>
>Setting the RequiredForOnline entry to false in systemd.network did
>not
>prevent the 120 second timeout from elapsing.
>
>Setting any of the following switches in the service file failed to
>eliminate the timeout:
>--ignore=eth0
>--interface=eth0:no-carrier # overrides RequiredForOnline
>--interface=eth0:no-carrier:no-carrier # <- probably a bad setting in
> # hindsight
>
>It appears systemd-networkd-wait-online expects some state greater
>than
>no-carrier to consider the link online, thus allowing it to exit with
>a
>SUCCESS error code. This even when explicitly instructed no-carrier
>is
>defined as "online".
>
>The only switch that seemed to perform as expected in this instance
>was
>--timeout. Assigning a value less than 120 to the --timeout control
>did
>reduce the wait period. It does assign a SUCCESS error code upon
>timing
>out, which is expected behavior.
>
>systemd-networkd-wait-online appears to have logic preventing
>no-carrier
>state from being assigned as the "network online" value.
>
>rsyslogd has both a network and network-online target. If the
>network-online target is removed then systemd-networkd-wait-online
>doesn't run, and any configuation of that service appears to be
>pointless. The conclusion I have from that is that
>network-online.target
>is a valid configuration option for a service to assign.
>
>There may be openbmc powered servers that do use the distributed
>logging
>provided by rsyslogd. If there are then globally removing
>network-online
>from the rsyslog service file is undesirable. I consider the same to
>be
>true of assigning a default RequiredForOnline=false.
>
>Based on the above, it's my opinion this is a vendor based decision
>for
>how to configure rsyslog/systemd-networkd-wait-online.
>


I just wanted to point out that for those using the kernel NCSI stack, 
the networks are always showing on line and link up because of how 
the stack was created.  My reading is it would take a new slave 
interface to overcome this limitation.

Milton

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-24 20:09               ` Patrick Williams
@ 2022-03-02  6:15                 ` Jiaqing Zhao
  0 siblings, 0 replies; 18+ messages in thread
From: Jiaqing Zhao @ 2022-03-02  6:15 UTC (permalink / raw)
  To: Patrick Williams, Johnathan Mantey; +Cc: Jeremy Kerr, OpenBMC Maillist, Lei Yu



On 2022-02-25 04:09, Patrick Williams wrote:
> On Wed, Feb 23, 2022 at 12:04:12PM -0800, Johnathan Mantey wrote:
>> On 2/23/22 09:44, Jiaqing Zhao wrote:
>>> On 2022-02-23 21:48, Patrick Williams wrote:
>>>> On Wed, Feb 23, 2022 at 10:09:19AM +0800, Jiaqing Zhao wrote:
> 
>> There may be openbmc powered servers that do use the distributed logging 
>> provided by rsyslogd. If there are then globally removing network-online 
>> from the rsyslog service file is undesirable. I consider the same to be 
>> true of assigning a default RequiredForOnline=false.
>>
>> Based on the above, it's my opinion this is a vendor based decision for 
>> how to configure rsyslog/systemd-networkd-wait-online.
> 
> I agree we shouldn't enable this globally, but that doesn't mean we can't add
> a simple PKGCONFIG that allows it to be enabled/disabled as needed.  That way
> we only have the single `PKGCONFIG:append` line in vendor layers and vendors
> that have a problem with it can leave it same as upstream.

I am also in favor of this solution, let the vendor decide whether rsyslog depends on network-online.target or not with the PKGCONFIG option.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Checking for network online
  2022-02-17 22:54 Checking for network online Johnathan Mantey
  2022-02-18  0:11 ` Jeremy Kerr
  2022-02-18 19:04 ` Doman, Jonathan
@ 2022-03-02  6:24 ` Ratan Gupta
  2 siblings, 0 replies; 18+ messages in thread
From: Ratan Gupta @ 2022-03-02  6:24 UTC (permalink / raw)
  To: Johnathan Mantey; +Cc: OpenBMC Maillist

[-- Attachment #1: Type: text/plain, Size: 1905 bytes --]

Hi Johnathan,

Can you ask this in the systemd community?Any workaround if they can offer.

Ratan



On Fri, Feb 18, 2022 at 4:26 AM Johnathan Mantey <
johnathanx.mantey@intel.com> wrote:

> /I/ntel has recently run into an issue regarding a systemd service, and
> we're interested in soliciting feedback from the community.
>
> Issue: systemd-networkd-wait-online.service stalls for 120 seconds when
> the managed NICs do not have a network connection.
>
> TLDR: Should OpenBMC remove systemd-networkd-wait-online.service
> universally?
>
> System Config: All NICs in the system are not connected to an active
> network.
>
> Test Process: The system under test (SUT) has AC removed, and some time
> later AC applied. Wait for BMC/BIOS to boot
>
> Behavior: U-Boot hands control to the Linux boot process, and the
> systemd services are started. When systemd-networkd-wait-online.service
> starts it stalls waiting for the NICs to enter a fully functional state.
> This never happens during the default 120 second timeout period for this
> service. When the timeout elapses, an error message is logged to the
> journal reporting the service exited unsuccessfully.
>
> Issues: This service blocks entry to multi-user.target.
> phosphor-state-manager uses multi-user.target to report the BMC is ready
> to use.
> This is reported via IPMI Get Device ID.
> The Intel BIOS is blocked from booting until
> systemd-networkd-wait-online times out.
> BMC entry to multi-user.target is delayed. Journal entries are created.
>
> Question for the community: Given the negative side effects caused by
> running this service does the community want to have this service
> collectively removed from global build image?
>
> --
> Johnathan Mantey
> Senior Software Engineer
> *azad te**chnology partners*
> Contributing to Technology Innovation since 1992
> Phone: (503) 712-6764
> Email: johnathanx.mantey@intel.com
>
>

[-- Attachment #2: Type: text/html, Size: 2473 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-03-02  6:25 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-17 22:54 Checking for network online Johnathan Mantey
2022-02-18  0:11 ` Jeremy Kerr
2022-02-18  2:29   ` Lei Yu
2022-02-18 16:11     ` Johnathan Mantey
2022-02-23  2:09       ` Jiaqing Zhao
2022-02-23 13:48         ` Patrick Williams
2022-02-23 17:44           ` Jiaqing Zhao
2022-02-23 18:36             ` Bills, Jason M
2022-02-23 18:58               ` Patrick Williams
2022-02-23 18:55             ` Patrick Williams
2022-02-23 20:04             ` Johnathan Mantey
2022-02-24 20:09               ` Patrick Williams
2022-03-02  6:15                 ` Jiaqing Zhao
2022-03-01 19:56             ` Milton Miller II
2022-02-18 19:04 ` Doman, Jonathan
2022-02-18 19:39   ` Johnathan Mantey
2022-02-23 13:58     ` Patrick Williams
2022-03-02  6:24 ` Ratan Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).