From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF4FEC4708A for ; Wed, 26 May 2021 06:52:33 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E50F6613E1 for ; Wed, 26 May 2021 06:52:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E50F6613E1 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=openbmc-bounces+openbmc=archiver.kernel.org@lists.ozlabs.org Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4FqhTW2SQgz301D for ; Wed, 26 May 2021 16:52:31 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=DMNDkSrr; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::b30; helo=mail-yb1-xb30.google.com; envelope-from=nanzhou@google.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=DMNDkSrr; dkim-atps=neutral Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4FqhSx3ppCz2xgF for ; Wed, 26 May 2021 16:51:59 +1000 (AEST) Received: by mail-yb1-xb30.google.com with SMTP id g38so548705ybi.12 for ; Tue, 25 May 2021 23:51:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8kgRqAOZB84ntHe6RMg9vMghuhHNJwnc9+apBpiI0PU=; b=DMNDkSrr/boIlh1GqQSnmMiTCu9gyIgOIGLgQH9D+kBjnLVNWowuip9SeT9ZrnaYmj EMTF+hoLeW5DeUEdwtrZirxCGPI/7odKsmT2qrmxGL5IQtHTDQRnSjN76V0v+fDdXAt8 hexc6vcERe4GgSPG7ZIDpWnKw3wW7xb8RARMOLeUs8ck7bCKU642edyA2ANEioi80Pd/ MM6WK2JSCMG2qgV/F4nXlGzyR7jVQ/bmBfJv2FY56JZgrBAqB9kL2H/BK0htFhR1ZCix 7gc6KanVym/5LZDAeZdyaCO/PquHn4f8+U2BHlhlPX2QYmOTgI+MyWJ5Hsy3IbnLnCaV xsAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8kgRqAOZB84ntHe6RMg9vMghuhHNJwnc9+apBpiI0PU=; b=oFQ4J76xjKVLXc2FNxKzxKqnXH4Tox1FELQp1ed0arsOiI6YnRKbwuIz3cECSQUY+y rE7/NjuiCKLK9N1NG4lnOXDWRRJzl4A96MWiSecUw2RxEkY5HVHgt97VIQGcivj5/thh IZXfz9QQCytPIWLRLVZMAWIy17efSdW2hDSC3+ax7E9cExMvs8xF7XHcyooY1rie9bOG UTHkvbuw/hZK+lBnyYTJjH0rfteqxi1Ze6DLcpkDlf5r6hRJ5OHhzI5KGsiG7aM2S6cq 3J6J1A2/nOmnRXbUKie4Me0dMLxhmfjmcC434n9h1akcgkCIOG863tzF2t7UJ3SMUfPN p5+Q== X-Gm-Message-State: AOAM533HenOUm4x6i3UeH7A4XZzTsI233jAQnHO+tmHJyofgMRzWpHVm /6kgEZsRYl7okJnW+oaYEBsxeJORDOYPItvH1TQqew== X-Google-Smtp-Source: ABdhPJzRhtOcAkl/ATBCvWgwjGm5LLaiPjPd3AoBVvfSbHsfCpOVMjt3ZHv5A8ETMtNVR11ptePEA+WjQVcgBQfSg4U= X-Received: by 2002:a25:bad0:: with SMTP id a16mr48073806ybk.441.1622011915339; Tue, 25 May 2021 23:51:55 -0700 (PDT) MIME-Version: 1.0 References: <20210521061023.4zy5s7fzycz5lppx@gmail.com> <20210524075248.m7mimyya42xas3nx@gmail.com> <20210525064127.xfgo5dceyvzrzpy6@gmail.com> <20210526061119.b7n2beqthbemsoba@gmail.com> In-Reply-To: <20210526061119.b7n2beqthbemsoba@gmail.com> From: Nan Zhou Date: Tue, 25 May 2021 23:51:44 -0700 Message-ID: Subject: Re: Link phosphor-hostlogger and bmcweb To: Artem Senichev Content-Type: multipart/alternative; boundary="00000000000009776e05c3361428" X-BeenThere: openbmc@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development list for OpenBMC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Spencer Ku , Litzung Chen , OpenBMC Maillist , Alexander Amelkin , Ed Tanous , Richard Hanley , a.senichev@yadro.com, a.filippov@yadro.com Errors-To: openbmc-bounces+openbmc=archiver.kernel.org@lists.ozlabs.org Sender: "openbmc" --00000000000009776e05c3361428 Content-Type: text/plain; charset="UTF-8" > > 1. log_buffer.xpp: > > will be removed; we propose to persist a chunk into the log file as soon > as > > we receive it; newline will be logged as is, so logs are still splitted > by > > newlines; > If you remove the newline handler we will lose the timestamps added for > each > line. Good catch. If timestamps are necessary (it is necessary for the Redfish LogEntry), I think we can keep the current newline detection logic in log_buffer.xpp. Once a new line is detected, we append the current timestamp first, and then append the bytes chunk. > we can potentially set the maximum length of a log as well (force > > split a long line into smaller lines) > I would leave the host output "as is". I don't think we should modify it. OK. We have concerns about edge cases where a log line is too long. > 2. host_console.xpp: stay unchanged; > > > > 3. zlib_file.xpp, zlib_exception.xpp: > > will be removed or slightly changed; we can potentially use the linux > > logrotate which has built-in compression and file rotation (in this case > > these compression utilities will be removed). > > The latest log file isn't compressed any more. History log files are > still > > compressed. > Just curious, how are you going to remove the oldest messages from the > latest > file in runtime? You are not going to rewrite the entire file on every > input > character, are you? The following is my current idea: we will rename the latest file to something else and notify the writer (hostlogger) to close its old file descriptor and open a new one (should be doable via linux logrotate and inotify or some signal handlers, as logrotate is able to send some signals to hostlogger if a rotation is performed). The writer keeps appending logs most of the time using the same fd unless the latest file is rotated. This should be better than truncating the file where the reader (BMCWeb) won't have race conditions (it might read old snapshots but it is not a big deal in our case). > We need to implement some codes to deal with the race condition in log > > rotations (we should reopen the writing fd after the latest log file is > > renamed; inotify might do the trick). > > We might also need to rotate according to host boot cycles (like what > > postcodes are doing right now). > > > > 3. dbus_loop.xpp: stay unchanged; > > > > 4. service.xpp: > > will be slightly changed; socket IO callback and host state watcher are > > kept; manual flash or flash upon service restart will be removed > The manual flush is used in the dreport, please don't forget to remove that > call there. Thanks for the information. > 5. config.xpp: > > will be slightly changed; options for flash policy will be removed; > Not slightly =) > Most options are related to the buffer and flush policies. > As I can see, there will stay the socket Id and output directory only. Yes. I would assume the host state policy will stay as well. How do you like the proposal now? I hope we make it more clear. Do you think it is a good direction that we can go with? On Tue, May 25, 2021 at 11:11 PM Artem Senichev wrote: > On Tue, May 25, 2021 at 10:29:48PM -0700, Nan Zhou wrote: > > Hi Artem, > > > > I listed the potential changes below (to the best of my understand of > > phosphor-hostlogger), > > > > 1. log_buffer.xpp: > > will be removed; we propose to persist a chunk into the log file as soon > as > > we receive it; newline will be logged as is, so logs are still splitted > by > > newlines; > > If you remove the newline handler we will lose the timestamps added for > each > line. > > > we can potentially set the maximum length of a log as well (force > > split a long line into smaller lines) > > I would leave the host output "as is". I don't think we should modify it. > > > 2. host_console.xpp: stay unchanged; > > > > 3. zlib_file.xpp, zlib_exception.xpp: > > will be removed or slightly changed; we can potentially use the linux > > logrotate which has built-in compression and file rotation (in this case > > these compression utilities will be removed). > > The latest log file isn't compressed any more. History log files are > still > > compressed. > > Just curious, how are you going to remove the oldest messages from the > latest > file in runtime? You are not going to rewrite the entire file on every > input > character, are you? > > > We need to implement some codes to deal with the race condition in log > > rotations (we should reopen the writing fd after the latest log file is > > renamed; inotify might do the trick). > > We might also need to rotate according to host boot cycles (like what > > postcodes are doing right now). > > > > 3. dbus_loop.xpp: stay unchanged; > > > > 4. service.xpp: > > will be slightly changed; socket IO callback and host state watcher are > > kept; manual flash or flash upon service restart will be removed > > The manual flush is used in the dreport, please don't forget to remove that > call there. > > > 5. config.xpp: > > will be slightly changed; options for flash policy will be removed; > > Not slightly =) > Most options are related to the buffer and flush policies. > As I can see, there will stay the socket Id and output directory only. > > > We might split the implementation into several phases. We might not jump > to > > the final version at one iteration. But the above are what we eventually > > want at this moment. > > > > On Tue, May 25, 2021 at 8:03 AM Nan Zhou wrote: > > > > > "we propose to remove the ring buffer and write to the log file as soon > > >> as some characters are received. This implicitly removes the needs of > > >> the ring buffer, and the persistence triggering (host reboot, sigterm, > > >> etc) in hostlogger" > > >> I would like to get a more detailed description of further changes in > > >> order > > >> to see the whole picture of the solution. > > > > > > Originally I didn't consider including changing the log file > according to > > > the boot cycle; we will keep part of the dbus/signal watcher to make > that > > > different reboot posts to a different log file. > > > > > > We will work out some more detailed descriptions for future changes > soon. > > > > > > On Mon, May 24, 2021 at 11:41 PM Artem Senichev > > > wrote: > > > > > >> Sorry guys, maybe this is a misunderstanding on my part. > > >> > > >> I was confused with the following line in the proposal: > > >> > > >> "we propose to remove the ring buffer and write to the log file as > soon > > >> as some characters are received. This implicitly removes the needs of > > >> the ring buffer, and the persistence triggering (host reboot, sigterm, > > >> etc) in hostlogger" > > >> > > >> I would like to get a more detailed description of further changes in > > >> order > > >> to see the whole picture of the solution. > > >> > > >> -- > > >> Regards, > > >> Artem Senichev > > >> Software Engineer, YADRO. > > >> > > >> > > >> On Mon, May 24, 2021 at 09:27:39AM -0700, Ed Tanous wrote: > > >> > On Mon, May 24, 2021 at 12:52 AM Artem Senichev > > > >> wrote: > > >> > > > > >> > > I'll try to convey the main idea that we tried to implement in > this > > >> service. > > >> > > > > >> > > Hostlogger was originally designed to work with OpenPOWER systems, > > >> which > > >> > > generate a very detailed log at boot time. > > >> > > > >> > There are definitely other non OpenPOWER systems that have this same > > >> behavior. > > >> > > > >> > > It is important to save these logs and the host console output > just > > >> before > > >> > > rebooting for further investigation of incidents when hardware > errors > > >> occur. > > >> > > So, we have two log files for each server session (boot log + > last OS > > >> messages). > > >> > > That's why we need a D-bus watcher. > > >> > > BMC flash has around 3MiB of free RW space, this force us to use > > >> compression > > >> > > and file rotation. > > >> > > > > >> > > All of these features are unnecessary for "streaming" real-time > log > > >> recording. > > >> > > > >> > I disagree with you there. Rotation and compression are still > useful > > >> > in a "streaming" case. Because of the ways the APIs are defined, > > >> > LogService in redfish provides both a "historical" version of past > > >> > logs. It's useful to have those logs rotated and compressed. > > >> > > > >> > > You don't need DBus watchers, rotation can be done with native > Linux > > >> utilities, > > >> > > you don't even need to split the input stream into lines. > > >> > > > >> > I'm not following why those now wouldn't be needed. Redfish > LogEntry > > >> > would separate per line, so we'd have to do the splitting somewhere. > > >> > There's already code to do that in hostlogger. Wouldn't you still > > >> > want to separate log per boot, and have lines split between log > files? > > >> > I'm not following why those would go away just because there's a > > >> > desire to poll for logs and get up to date information. > > >> > > > >> > > Just redirect obmc-console.log: `tail -f > /var/log/obmc-console.log > > > >> my.log`. > > >> > > > >> > This doesn't solve the problem presented. First of all, log > rotation > > >> > and compression are still needed. Also, it's desirable to have dbus > > >> > watchers and separate the logs per boot, such that they can end up > > >> > separated in the Redfish API. Also, in what you presented, my.log > > >> > would quickly and easily overflow the available space, as there's no > > >> > log rotation. > > >> > > > >> > > > > >> > > I understand your desire to add a new mode to the existing project > > >> instead of > > >> > > creating a new one. But there is very little in common between > these > > >> modes. > > >> > > > >> > I'm not following how they're all that different, see above about > > >> > needing very similar features. For the sake of argument, lets say > we > > >> > went with a totally different implementation, would it be able to > live > > >> > in the hostlogger repo to be able to reuse code where needed? > There's > > >> > a lot of code that I suspect will be identical. > > >> > > > >> > > Even reading the socket will have to be done separately, since it > is > > >> buffered > > >> > > for line splitting in the current implementation. > > >> > > In the end, only bb-recipe and the `main` function will remain in > the > > >> common. > > >> > > > > >> > > -- > > >> > > Regards, > > >> > > Artem Senichev > > >> > > Software Engineer, YADRO. > > >> > > > > >> > > On Fri, May 21, 2021 at 10:51:45AM -0700, Nan Zhou wrote: > > >> > > > > >> > > > > > > >> > > > > > > > >> > > > > > > we propose to remove the ring buffer and write to the log > file > > >> > > > > > > as soon as some characters are received. > > >> > > > > > > > >> > > > > > I am not sure it is a good idea. > > >> > > > > > The host can generate a lot of logs, but we have very > limited > > >> free space. > > >> > > > > This is a fair concern, but wouldn't the rollover limits make > > >> this not > > >> > > > > an issue? They seem like they could be easily configured. > > >> > > > > > >> > > > Right. Logrotate will be able to handle the rotation. Maximum > size, > > >> # log > > >> > > > files, and compression can be easily configured. > > >> > > > > > >> > > > > In > > >> > > > > > addition, such heavy traffic will lead to a quick breakdown > of > > >> the flash > > >> > > > > (most > > >> > > > > > available products are guaranteed to withstand around > 100,000 > > >> P/E cycles > > >> > > > > only). > > >> > > > > JFFS2 is wear leveled, and there are other BMC stacks that I > know > > >> of > > >> > > > > that implement this without any ill effects to flash > longevity, > > >> with > > >> > > > > that said, if Nan made the "last log on disk" feature > > >> configurable, > > >> > > > > would that alleviate your concerns? > > >> > > > > > >> > > > We also noticed that the obmc-server itself will buffer the log > a > > >> bit. Will > > >> > > > it still be a problem if we don't write a character at once but > a > > >> block of > > >> > > > them? > > >> > > > And as Ed said, we can also make this feature configurable. I > would > > >> imagine > > >> > > > the log buffer will remain if the "last log on disk" feature is > > >> disabled. > > >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > > This implicitly removes the needs > > >> > > > > > > of the ring buffer, and the persistence triggering (host > > >> reboot, > > >> > > > > sigterm, > > >> > > > > > > etc) in hostlogger. We believe this keeps the same > > >> functionality but > > >> > > > > saves > > >> > > > > > > hundreds of lines of codes in phosphor-hostlogger. > > >> > > > > Difference of opinion here, I don't think this removes the > need > > >> for > > >> > > > > the host reboot event; Having each reboot post to a > different log > > >> > > > > needs to be maintained, and I have to imagine that there's > some > > >> sort > > >> > > > > of sigterm handler still, although it becomes a lot smaller. > > >> > > > > > >> > > > > > >> > > > > > > >> > > > > > You are suggesting to delete the buffer, DBus watcher, log > > >> rotate. How > > >> > > > > are you > > >> > > > > > going to keep the same functionality if you remove > everything > > >> related to > > >> > > > > it? > > >> > > > > +1. In the initial thought I didn't think we were removing > any > > >> > > > > functionality with this. I had assumed the dbus watcher would > > >> remain, > > >> > > > > and we would still have the log rotation behavior. In reading > > >> through > > >> > > > > Nans proposal I don't think these are getting removed; Maybe > I > > >> > > > > misunderstood? > > >> > > > > > >> > > > > > >> > > > Yes, if we want to keep different reboot posts to a different > log > > >> file, we > > >> > > > can keep part of the dbus/signal watcher. > > >> > > > > > >> > > > On Fri, May 21, 2021 at 10:24 AM Ed Tanous > > > >> wrote: > > >> > > > > > >> > > > > On Thu, May 20, 2021 at 11:10 PM Artem Senichev < > > >> artemsen@gmail.com> > > >> > > > > wrote: > > >> > > > > > > > >> > > > > > On Thu, May 20, 2021 at 04:29:09PM -0700, Nan Zhou wrote: > > >> > > > > > > Hi all, > > >> > > > > > > > > >> > > > > > > In the previous thread ( > > >> > > > > > > > > >> https://lists.ozlabs.org/pipermail/openbmc/2021-March/025234.html), > we > > >> > > > > > > (engineers from Google and Quanta) discussed our attempt > to > > >> share host > > >> > > > > > > serial logs via Redfish, which includes polling logs via > > >> LogService and > > >> > > > > > > streaming log bytes via EventService (e.g. SSE). We went > with > > >> the > > >> > > > > event log > > >> > > > > > > architecture > > >> > > > > > > < > > >> > > > > > > >> > https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md > > >> > > > > > > > >> > > > > > > and did the implementation. > > >> > > > > > > > > >> > > > > > > We still want to reuse the phosphor-hostlogger and do some > > >> > > > > modification. We > > >> > > > > > > believe it is better to try to reuse existing codes if > > >> possible and > > >> > > > > improve > > >> > > > > > > them rather than creating new things that have similar > > >> functionalities > > >> > > > > (in > > >> > > > > > > our case, it is a daemon that could collect logs and > persist > > >> them). > > >> > > > > > > > >> > > > > > I agree, reusing code is a right choice, but only when it is > > >> really > > >> > > > > possible. > > >> > > > > > For now it looks like you want to remove most of the > Hostlogger > > >> features > > >> > > > > to > > >> > > > > > transform it from buffer-like to stream-like service. > > >> > > > > > > >> > > > > I'm not quite following this statement. Which features are > > >> getting > > >> > > > > removed? From what I can see, he's suggesting making > > >> > > > > phosphor-hostlogger look more like a well-behaved linux > logging > > >> > > > > daemon, but I don't think any features got omitted (or I'm > missing > > >> > > > > something critical). > > >> > > > > > > >> > > > > > > > >> > > > > > > We want to do the following modification in > > >> phosphor-hostlogger (from > > >> > > > > the > > >> > > > > > > input and output point of view, just very few things will > be > > >> changed) > > >> > > > > > > > > >> > > > > > > 1. One limitation of phosphor-hostlogger is that it will > lose > > >> data in > > >> > > > > the > > >> > > > > > > memory (the ring buffer maintained by hostlogger) when BMC > > >> gets force > > >> > > > > > > restarted; > > >> > > > > > > > >> > > > > > When BMC goes to reboot it stops all services, at that > moment > > >> hostlogger > > >> > > > > gets > > >> > > > > > a signal and flushes all gathered logs to a file. > > >> > > > > > > >> > > > > Only if the reboot is planned. If the BMC loses power (which > is > > >> > > > > "normal" for a bmc) there isn't time to persist to flash > before > > >> the > > >> > > > > power goes down and the logs are most likely lost. > > >> > > > > > > >> > > > > > > > >> > > > > > > we propose to remove the ring buffer and write to the log > file > > >> > > > > > > as soon as some characters are received. > > >> > > > > > > > >> > > > > > I am not sure it is a good idea. > > >> > > > > > The host can generate a lot of logs, but we have very > limited > > >> free space. > > >> > > > > > > >> > > > > This is a fair concern, but wouldn't the rollover limits make > > >> this not > > >> > > > > an issue? They seem like they could be easily configured. > > >> > > > > > > >> > > > > > In > > >> > > > > > addition, such heavy traffic will lead to a quick breakdown > of > > >> the flash > > >> > > > > (most > > >> > > > > > available products are guaranteed to withstand around > 100,000 > > >> P/E cycles > > >> > > > > only). > > >> > > > > > > >> > > > > JFFS2 is wear leveled, and there are other BMC stacks that I > know > > >> of > > >> > > > > that implement this without any ill effects to flash > longevity, > > >> with > > >> > > > > that said, if Nan made the "last log on disk" feature > > >> configurable, > > >> > > > > would that alleviate your concerns? > > >> > > > > > > >> > > > > > > > >> > > > > > > This implicitly removes the needs > > >> > > > > > > of the ring buffer, and the persistence triggering (host > > >> reboot, > > >> > > > > sigterm, > > >> > > > > > > etc) in hostlogger. We believe this keeps the same > > >> functionality but > > >> > > > > saves > > >> > > > > > > hundreds of lines of codes in phosphor-hostlogger. > > >> > > > > > > >> > > > > Difference of opinion here, I don't think this removes the > need > > >> for > > >> > > > > the host reboot event; Having each reboot post to a > different log > > >> > > > > needs to be maintained, and I have to imagine that there's > some > > >> sort > > >> > > > > of sigterm handler still, although it becomes a lot smaller. > > >> > > > > > > >> > > > > > > > >> > > > > > You are suggesting to delete the buffer, DBus watcher, log > > >> rotate. How > > >> > > > > are you > > >> > > > > > going to keep the same functionality if you remove > everything > > >> related to > > >> > > > > it? > > >> > > > > > > >> > > > > +1. In the initial thought I didn't think we were removing > any > > >> > > > > functionality with this. I had assumed the dbus watcher would > > >> remain, > > >> > > > > and we would still have the log rotation behavior. In reading > > >> through > > >> > > > > Nans proposal I don't think these are getting removed; Maybe > I > > >> > > > > misunderstood? > > >> > > > > > > >> > > > > > > > >> > > > > > > 2. We propose not to compress the latest log file. This > saves > > >> us the > > >> > > > > > > overhead of doing decompression when BMCWeb just needs to > > >> retrieve the > > >> > > > > most > > >> > > > > > > recent logs. There are still going to be log rotations in > the > > >> file > > >> > > > > level. > > >> > > > > > > Files other than the latest log file are still going to be > > >> compressed. > > >> > > > > We > > >> > > > > > > can modify existing codes to achieve this or use the linux > > >> logrotate > > >> > > > > > > directly. > > >> > > > > > > > > >> > > > > > > Furthermore, we will add host serial logs into BMCWeb, > both > > >> LogService > > >> > > > > and > > >> > > > > > > EventService. In LogService, we will teach BMCWeb how to > read > > >> the > > >> > > > > latest > > >> > > > > > > log file that is not compressed and the other compressed > old > > >> logs, and > > >> > > > > how > > >> > > > > > > to assemble LogEntries out of raw serial logs. We will > discuss > > >> > > > > EventService > > >> > > > > > > in future threads but the very initial idea is to setup > > >> inotify on log > > >> > > > > > > files and SSE to stream out new bytes to clients (like > what > > >> the > > >> > > > > existing > > >> > > > > > > event logging is doing). > > >> > > > > > > > > >> > > > > > > As we said above, for phosphor-hostlogger, the input is > still > > >> the > > >> > > > > > > obmc-server unix socket, and the output are still > persisted > > >> log files. > > >> > > > > But > > >> > > > > > > the functionality will get improved (less data loss), code > > >> gets > > >> > > > > simplified > > >> > > > > > > (no ring buffer and persistence triggering), and it will > > >> become easier > > >> > > > > and > > >> > > > > > > more performant to get BMCWeb hooked up for log streaming > via > > >> Redfish. > > >> > > > > > > > > >> > > > > > > Please let us know what you think. We appreciate any > > >> feedback. Thank > > >> > > > > you > > >> > > > > > > very much! > > >> > > > > > > > > >> > > > > > > Sincerely, > > >> > > > > > > Nan > > >> > > > > > > > >> > > > > > -- > > >> > > > > > Regards, > > >> > > > > > Artem Senichev > > >> > > > > > Software Engineer, YADRO. > > >> > > > > > > >> > > > > > -- > Regards, > Artem Senichev > Software Engineer, YADRO. > --00000000000009776e05c3361428 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
=C2=A0
> 1. log_buffer.xpp:
> will be removed; we propose to persist a chunk into = the log file as soon as
> we receive it; newline will be logged as is, so logs are = still splitted by
> newlines;
If you remove the newline handler we will lose the timest= amps added for each
line.
Good catch. If timestamps are= necessary (it is necessary for the Redfish LogEntry), I think we can keep = the current newline detection logic in log_buffer.xpp. Once a new line is d= etected, we append the current timestamp first, and then append the bytes c= hunk.

> we can potentia= lly set the maximum length of a log as well (force
> split a long line into smaller= lines)
I would leave the host output "as is". I don't think we s= hould modify it.
OK. We have concerns about edge cases whe= re a log line is too long.=C2=A0=C2=A0

> 2. host_console.xpp: stay unchanged;
>
> 3. zlib_file.xpp, zlib_exception= .xpp:
> = will be removed or slightly changed; we can potentially use the linux
> logrotate w= hich has built-in compression and file rotation (in this case
> these compression u= tilities will be removed).
> The latest log file isn't compressed any more. His= tory log files are still
> compressed.
Just curious, how are you going to remove the ol= dest messages from the latest
file in runtime? You are not going to rewr= ite the entire file on every input
character, are you?
= The=C2=A0following is my current idea: we will rename the latest file to so= mething else and notify the writer (hostlogger) to close its old file descr= iptor and open a new one (should be doable via linux logrotate and inotify = or some signal handlers, as logrotate is able to send some signals to hostl= ogger if a rotation is performed). The writer keeps appending logs most of = the time using the same fd unless the latest file is rotated. This should b= e better=C2=A0than truncating the file where the reader (BMCWeb) won't = have race conditions (it might read old snapshots but it is not a big=C2=A0= deal in our case).=C2=A0

&= gt; We need to implement some codes to deal with the race condition in log<= br>> rotati= ons (we should reopen the writing fd after the latest log file is
> renamed; inotif= y might do the trick).
> We might also need to rotate according to host boot cycles= (like what
> postcodes are doing right now).
>
> 3. dbus_loop.xpp: stay unchanged;
>
> 4. service.xpp:
<= span class=3D"gmail-im" style=3D"color:rgb(80,0,80)">> will be slightly = changed; socket IO callback and host state watcher are
> kept; manual flash or flas= h upon service restart will be removed
The manual flush is used in the dreport,= please don't forget to remove that
call there.
Tha= nks for the information.

=C2=A0> 5. config= .xpp:
> = will be slightly changed; options for flash policy will be removed;<= span class=3D"gmail-im" style=3D"color:rgb(80,0,80)">
Not slightl= y =3D)
Most options are related to the buffer and flush policies.
As = I can see, there will stay the socket Id and output directory only.
Yes. I would assume the host state policy will stay as well.

How do you like=C2=A0the proposal now? I hope we make = it more clear. Do you think it is a good direction that we can go with?=C2= =A0 =C2=A0=C2=A0

On Tue, May 25, 2021 at 11:11 PM Artem Senichev <<= a href=3D"mailto:artemsen@gmail.com">artemsen@gmail.com> wrote:
<= /div>
On Tue, May 25, 2021= at 10:29:48PM -0700, Nan Zhou wrote:
> Hi Artem,
>
> I listed the potential changes below (to the best of my understand of<= br> > phosphor-hostlogger),
>
> 1. log_buffer.xpp:
> will be removed; we propose to persist a chunk into the log file as so= on as
> we receive it; newline will be logged as is, so logs are still splitte= d by
> newlines;

If you remove the newline handler we will lose the timestamps added for eac= h
line.

> we can potentially set the maximum length of a log as well (force
> split a long line into smaller lines)

I would leave the host output "as is". I don't think we shoul= d modify it.

> 2. host_console.xpp: stay unchanged;
>
> 3. zlib_file.xpp, zlib_exception.xpp:
> will be removed or slightly changed; we can potentially use the linux<= br> > logrotate which has built-in compression and file rotation (in this ca= se
> these compression utilities will be removed).
> The latest log file isn't compressed any more. History log files a= re still
> compressed.

Just curious, how are you going to remove the oldest messages from the late= st
file in runtime? You are not going to rewrite the entire file on every inpu= t
character, are you?

> We need to implement some codes to deal with the race condition in log=
> rotations (we should reopen the writing fd after the latest log file i= s
> renamed; inotify might do the trick).
> We might also need to rotate according to host boot cycles (like what<= br> > postcodes are doing right now).
>
> 3. dbus_loop.xpp: stay unchanged;
>
> 4. service.xpp:
> will be slightly changed; socket IO callback and host state watcher ar= e
> kept; manual flash or flash upon service restart will be removed

The manual flush is used in the dreport, please don't forget to remove = that
call there.

> 5. config.xpp:
> will be slightly changed; options for flash policy will be removed;
Not slightly =3D)
Most options are related to the buffer and flush policies.
As I can see, there will stay the socket Id and output directory only.

> We might split the implementation into several phases. We might not ju= mp to
> the final version at one iteration. But the above are what we eventual= ly
> want at this moment.
>
> On Tue, May 25, 2021 at 8:03 AM Nan Zhou <nanzhou@google.com> wrote:
>
> > "we propose to remove the ring buffer and write to the log f= ile as soon
> >> as some characters are received. This implicitly removes the = needs of
> >> the ring buffer, and the persistence triggering (host reboot,= sigterm,
> >> etc) in hostlogger"
> >> I would like to get a more detailed description of further ch= anges in
> >> order
> >> to see the whole picture of the solution.
> >
> >=C2=A0 Originally I didn't consider including changing the log= file according to
> > the boot cycle; we will keep part of the dbus/signal watcher to m= ake that
> > different reboot posts to a different log file.
> >
> > We will work out some more detailed descriptions for future chang= es soon.
> >
> > On Mon, May 24, 2021 at 11:41 PM Artem Senichev <artemsen@gmail.com>
> > wrote:
> >
> >> Sorry guys, maybe this is a misunderstanding on my part.
> >>
> >> I was confused with the following line in the proposal:
> >>
> >> "we propose to remove the ring buffer and write to the l= og file as soon
> >> as some characters are received. This implicitly removes the = needs of
> >> the ring buffer, and the persistence triggering (host reboot,= sigterm,
> >> etc) in hostlogger"
> >>
> >> I would like to get a more detailed description of further ch= anges in
> >> order
> >> to see the whole picture of the solution.
> >>
> >> --
> >> Regards,
> >> Artem Senichev
> >> Software Engineer, YADRO.
> >>
> >>
> >> On Mon, May 24, 2021 at 09:27:39AM -0700, Ed Tanous wrote: > >> > On Mon, May 24, 2021 at 12:52 AM Artem Senichev <artemsen@gmail.com&= gt;
> >> wrote:
> >> > >
> >> > > I'll try to convey the main idea that we tried = to implement in this
> >> service.
> >> > >
> >> > > Hostlogger was originally designed to work with Ope= nPOWER systems,
> >> which
> >> > > generate a very detailed log at boot time.
> >> >
> >> > There are definitely other non OpenPOWER systems that ha= ve this same
> >> behavior.
> >> >
> >> > > It is important to save these logs and the host con= sole output just
> >> before
> >> > > rebooting for further investigation of incidents wh= en hardware errors
> >> occur.
> >> > > So, we have two log files for each server session (= boot log + last OS
> >> messages).
> >> > > That's why we need a D-bus watcher.
> >> > > BMC flash has around 3MiB of free RW space, this fo= rce us to use
> >> compression
> >> > > and file rotation.
> >> > >
> >> > > All of these features are unnecessary for "str= eaming" real-time log
> >> recording.
> >> >
> >> > I disagree with you there.=C2=A0 Rotation and compressio= n are still useful
> >> > in a "streaming" case.=C2=A0 Because of the wa= ys the APIs are defined,
> >> > LogService in redfish provides both a "historical&q= uot; version of past
> >> > logs.=C2=A0 It's useful to have those logs rotated a= nd compressed.
> >> >
> >> > > You don't need DBus watchers, rotation can be d= one with native Linux
> >> utilities,
> >> > > you don't even need to split the input stream i= nto lines.
> >> >
> >> > I'm not following why those now wouldn't be need= ed.=C2=A0 Redfish LogEntry
> >> > would separate per line, so we'd have to do the spli= tting somewhere.
> >> > There's already code to do that in hostlogger.=C2=A0= Wouldn't you still
> >> > want to separate log per boot, and have lines split betw= een log files?
> >> >=C2=A0 I'm not following why those would go away just= because there's a
> >> > desire to poll for logs and get up to date information.<= br> > >> >
> >> > > Just redirect obmc-console.log: `tail -f /var/log/o= bmc-console.log >
> >> my.log`.
> >> >
> >> > This doesn't solve the problem presented.=C2=A0 Firs= t of all, log rotation
> >> > and compression are still needed.=C2=A0 Also, it's d= esirable to have dbus
> >> > watchers and separate the logs per boot, such that they = can end up
> >> > separated in the Redfish API.=C2=A0 Also, in what you pr= esented, my.log
> >> > would quickly and easily overflow the available space, a= s there's no
> >> > log rotation.
> >> >
> >> > >
> >> > > I understand your desire to add a new mode to the e= xisting project
> >> instead of
> >> > > creating a new one. But there is very little in com= mon between these
> >> modes.
> >> >
> >> > I'm not following how they're all that different= , see above about
> >> > needing very similar features.=C2=A0 For the sake of arg= ument, lets say we
> >> > went with a totally different implementation, would it b= e able to live
> >> > in the hostlogger repo to be able to reuse code where ne= eded?=C2=A0 There's
> >> > a lot of code that I suspect will be identical.
> >> >
> >> > > Even reading the socket will have to be done separa= tely, since it is
> >> buffered
> >> > > for line splitting in the current implementation. > >> > > In the end, only bb-recipe and the `main` function = will remain in the
> >> common.
> >> > >
> >> > > --
> >> > > Regards,
> >> > > Artem Senichev
> >> > > Software Engineer, YADRO.
> >> > >
> >> > > On Fri, May 21, 2021 at 10:51:45AM -0700, Nan Zhou = wrote:
> >> > >
> >> > > > >
> >> > > > > >
> >> > > > > > > we propose to remove the ring b= uffer and write to the log file
> >> > > > > > > as soon as some characters are = received.
> >> > > > > >
> >> > > > > > I am not sure it is a good idea.
> >> > > > > > The host can generate a lot of logs,= but we have very limited
> >> free space.
> >> > > > > This is a fair concern, but wouldn't = the rollover limits make
> >> this not
> >> > > > > an issue?=C2=A0 They seem like they could= be easily configured.
> >> > > >
> >> > > > Right. Logrotate will be able to handle the ro= tation. Maximum size,
> >> # log
> >> > > > files, and compression can be easily configure= d.
> >> > > >
> >> > > > > In
> >> > > > > > addition, such heavy traffic will le= ad to a quick breakdown of
> >> the flash
> >> > > > > (most
> >> > > > > > available products are guaranteed to= withstand around 100,000
> >> P/E cycles
> >> > > > > only).
> >> > > > > JFFS2 is wear leveled, and there are othe= r BMC stacks that I know
> >> of
> >> > > > > that implement this without any ill effec= ts to flash longevity,
> >> with
> >> > > > > that said, if Nan made the "last log= on disk" feature
> >> configurable,
> >> > > > > would that alleviate your concerns?
> >> > > >
> >> > > > We also noticed that the obmc-server itself wi= ll buffer the log a
> >> bit. Will
> >> > > > it still be a problem if we don't write a = character at once but a
> >> block of
> >> > > > them?
> >> > > > And as Ed said, we can also make this feature = configurable. I would
> >> imagine
> >> > > > the log buffer will remain if the "last l= og on disk" feature is
> >> disabled.
> >> > > >
> >> > > >
> >> > > > > >
> >> > > > > > > This implicitly removes the nee= ds
> >> > > > > > > of the ring buffer, and the per= sistence triggering (host
> >> reboot,
> >> > > > > sigterm,
> >> > > > > > > etc) in hostlogger. We believe = this keeps the same
> >> functionality but
> >> > > > > saves
> >> > > > > > > hundreds of lines of codes in p= hosphor-hostlogger.
> >> > > > > Difference of opinion here, I don't t= hink this removes the need
> >> for
> >> > > > > the host reboot event;=C2=A0 Having each = reboot post to a different log
> >> > > > > needs to be maintained, and I have to ima= gine that there's some
> >> sort
> >> > > > > of sigterm handler still, although it bec= omes a lot smaller.
> >> > > >
> >> > > >
> >> > > > >
> >> > > > > > You are suggesting to delete the buf= fer, DBus watcher, log
> >> rotate. How
> >> > > > > are you
> >> > > > > > going to keep the same functionality= if you remove everything
> >> related to
> >> > > > > it?
> >> > > > > +1.=C2=A0 In the initial thought I didn&#= 39;t think we were removing any
> >> > > > > functionality with this.=C2=A0 I had assu= med the dbus watcher would
> >> remain,
> >> > > > > and we would still have the log rotation = behavior.=C2=A0 In reading
> >> through
> >> > > > > Nans proposal I don't think these are= getting removed;=C2=A0 Maybe I
> >> > > > > misunderstood?
> >> > > >
> >> > > >
> >> > > > Yes, if we want to keep different reboot posts= to a different log
> >> file, we
> >> > > > can keep part of the dbus/signal watcher.
> >> > > >
> >> > > > On Fri, May 21, 2021 at 10:24 AM Ed Tanous <= ;edtanous@google.c= om>
> >> wrote:
> >> > > >
> >> > > > > On Thu, May 20, 2021 at 11:10 PM Artem Se= nichev <
> >> artem= sen@gmail.com>
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > On Thu, May 20, 2021 at 04:29:09PM -= 0700, Nan Zhou wrote:
> >> > > > > > > Hi all,
> >> > > > > > >
> >> > > > > > > In the previous thread (
> >> > > > > > >
> >> https://lists.ozlabs.= org/pipermail/openbmc/2021-March/025234.html), we
> >> > > > > > > (engineers from Google and Quan= ta) discussed our attempt to
> >> share host
> >> > > > > > > serial logs via Redfish, which = includes polling logs via
> >> LogService and
> >> > > > > > > streaming log bytes via EventSe= rvice (e.g. SSE). We went with
> >> the
> >> > > > > event log
> >> > > > > > > architecture
> >> > > > > > > <
> >> > > > >
> >> h= ttps://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-= bmcweb.md
> >> > > > > >
> >> > > > > > > and did the implementation.
> >> > > > > > >
> >> > > > > > > We still want to reuse the phos= phor-hostlogger and do some
> >> > > > > modification. We
> >> > > > > > > believe it is better to try to = reuse existing codes if
> >> possible and
> >> > > > > improve
> >> > > > > > > them rather than creating new t= hings that have similar
> >> functionalities
> >> > > > > (in
> >> > > > > > > our case, it is a daemon that c= ould collect logs and persist
> >> them).
> >> > > > > >
> >> > > > > > I agree, reusing code is a right cho= ice, but only when it is
> >> really
> >> > > > > possible.
> >> > > > > > For now it looks like you want to re= move most of the Hostlogger
> >> features
> >> > > > > to
> >> > > > > > transform it from buffer-like to str= eam-like service.
> >> > > > >
> >> > > > > I'm not quite following this statemen= t.=C2=A0 Which features are
> >> getting
> >> > > > > removed?=C2=A0 From what I can see, he= 9;s suggesting making
> >> > > > > phosphor-hostlogger look more like a well= -behaved linux logging
> >> > > > > daemon, but I don't think any feature= s got omitted (or I'm missing
> >> > > > > something critical).
> >> > > > >
> >> > > > > >
> >> > > > > > > We want to do the following mod= ification in
> >> phosphor-hostlogger (from
> >> > > > > the
> >> > > > > > > input and output point of view,= just very few things will be
> >> changed)
> >> > > > > > >
> >> > > > > > > 1. One limitation of phosphor-h= ostlogger is that it will lose
> >> data in
> >> > > > > the
> >> > > > > > > memory (the ring buffer maintai= ned by hostlogger) when BMC
> >> gets force
> >> > > > > > > restarted;
> >> > > > > >
> >> > > > > > When BMC goes to reboot it stops all= services, at that moment
> >> hostlogger
> >> > > > > gets
> >> > > > > > a signal and flushes all gathered lo= gs to a file.
> >> > > > >
> >> > > > > Only if the reboot is planned.=C2=A0 If t= he BMC loses power (which is
> >> > > > > "normal" for a bmc) there isn&#= 39;t time to persist to flash before
> >> the
> >> > > > > power goes down and the logs are most lik= ely lost.
> >> > > > >
> >> > > > > >
> >> > > > > > > we propose to remove the ring b= uffer and write to the log file
> >> > > > > > > as soon as some characters are = received.
> >> > > > > >
> >> > > > > > I am not sure it is a good idea.
> >> > > > > > The host can generate a lot of logs,= but we have very limited
> >> free space.
> >> > > > >
> >> > > > > This is a fair concern, but wouldn't = the rollover limits make
> >> this not
> >> > > > > an issue?=C2=A0 They seem like they could= be easily configured.
> >> > > > >
> >> > > > > > In
> >> > > > > > addition, such heavy traffic will le= ad to a quick breakdown of
> >> the flash
> >> > > > > (most
> >> > > > > > available products are guaranteed to= withstand around 100,000
> >> P/E cycles
> >> > > > > only).
> >> > > > >
> >> > > > > JFFS2 is wear leveled, and there are othe= r BMC stacks that I know
> >> of
> >> > > > > that implement this without any ill effec= ts to flash longevity,
> >> with
> >> > > > > that said, if Nan made the "last log= on disk" feature
> >> configurable,
> >> > > > > would that alleviate your concerns?
> >> > > > >
> >> > > > > >
> >> > > > > > > This implicitly removes the nee= ds
> >> > > > > > > of the ring buffer, and the per= sistence triggering (host
> >> reboot,
> >> > > > > sigterm,
> >> > > > > > > etc) in hostlogger. We believe = this keeps the same
> >> functionality but
> >> > > > > saves
> >> > > > > > > hundreds of lines of codes in p= hosphor-hostlogger.
> >> > > > >
> >> > > > > Difference of opinion here, I don't t= hink this removes the need
> >> for
> >> > > > > the host reboot event;=C2=A0 Having each = reboot post to a different log
> >> > > > > needs to be maintained, and I have to ima= gine that there's some
> >> sort
> >> > > > > of sigterm handler still, although it bec= omes a lot smaller.
> >> > > > >
> >> > > > > >
> >> > > > > > You are suggesting to delete the buf= fer, DBus watcher, log
> >> rotate. How
> >> > > > > are you
> >> > > > > > going to keep the same functionality= if you remove everything
> >> related to
> >> > > > > it?
> >> > > > >
> >> > > > > +1.=C2=A0 In the initial thought I didn&#= 39;t think we were removing any
> >> > > > > functionality with this.=C2=A0 I had assu= med the dbus watcher would
> >> remain,
> >> > > > > and we would still have the log rotation = behavior.=C2=A0 In reading
> >> through
> >> > > > > Nans proposal I don't think these are= getting removed;=C2=A0 Maybe I
> >> > > > > misunderstood?
> >> > > > >
> >> > > > > >
> >> > > > > > > 2. We propose not to compress t= he latest log file. This saves
> >> us the
> >> > > > > > > overhead of doing decompression= when BMCWeb just needs to
> >> retrieve the
> >> > > > > most
> >> > > > > > > recent logs. There are still go= ing to be log rotations in the
> >> file
> >> > > > > level.
> >> > > > > > > Files other than the latest log= file are still going to be
> >> compressed.
> >> > > > > We
> >> > > > > > > can modify existing codes to ac= hieve this or use the linux
> >> logrotate
> >> > > > > > > directly.
> >> > > > > > >
> >> > > > > > > Furthermore, we will add host s= erial logs into BMCWeb, both
> >> LogService
> >> > > > > and
> >> > > > > > > EventService. In LogService, we= will teach BMCWeb how to read
> >> the
> >> > > > > latest
> >> > > > > > > log file that is not compressed= and the other compressed old
> >> logs, and
> >> > > > > how
> >> > > > > > > to assemble LogEntries out of r= aw serial logs. We will discuss
> >> > > > > EventService
> >> > > > > > > in future threads but the very = initial idea is to setup
> >> inotify on log
> >> > > > > > > files and SSE to stream out new= bytes to clients (like what
> >> the
> >> > > > > existing
> >> > > > > > > event logging is doing).
> >> > > > > > >
> >> > > > > > > As we said above, for phosphor-= hostlogger, the input is still
> >> the
> >> > > > > > > obmc-server unix socket, and th= e output are still persisted
> >> log files.
> >> > > > > But
> >> > > > > > > the functionality will get impr= oved (less data loss), code
> >> gets
> >> > > > > simplified
> >> > > > > > > (no ring buffer and persistence= triggering), and it will
> >> become easier
> >> > > > > and
> >> > > > > > > more performant to get BMCWeb h= ooked up for log streaming via
> >> Redfish.
> >> > > > > > >
> >> > > > > > > Please let us know what you thi= nk. We appreciate any
> >> feedback. Thank
> >> > > > > you
> >> > > > > > > very much!
> >> > > > > > >
> >> > > > > > > Sincerely,
> >> > > > > > > Nan
> >> > > > > >
> >> > > > > > --
> >> > > > > > Regards,
> >> > > > > > Artem Senichev
> >> > > > > > Software Engineer, YADRO.
> >> > > > >
> >>
> >

--
Regards,
Artem Senichev
Software Engineer, YADRO.
--00000000000009776e05c3361428--