openbmc.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* ipmitool fru write 0 - does not update "baseboard" FRU
@ 2024-05-02 18:31 Oskar Senft
  2024-05-03 15:38 ` Willy Tu
  0 siblings, 1 reply; 7+ messages in thread
From: Oskar Senft @ 2024-05-02 18:31 UTC (permalink / raw)
  To: OpenBMC Maillist; +Cc: Ed Tanous, Ali El-Haj-Mahmoud

Hi everyone

tl;dr - Attempting "ipmitool fru write" with an input file that
contains additional bytes beyond the actual FRU data does not actually
update the FRU on OpenBMC at head w/ entity-manager.

Details:

I ran into an issue with updating the "baseboard" FRU (0), which is
stored as /etc/fru/baseboard.fru.bin. However, I don't think this is
specific to FRU 0 and could apply to other updateable FRUs in the same
way.


1. Start off with a "minimal" /etc/fru/baseboard.fru.bin which just
contains chassis type (that's required for entity-manager's fru-device
to recognize the file.

root@akita:~# ls -al /etc/fru/baseboard.fru.bin
-rw-r--r--    1 root     root           512 Sep 21 05:21
/etc/fru/baseboard.fru.bin

root@akita:~# hexdump -C /etc/fru/baseboard.fru.bin
00000000  01 00 01 00 00 00 00 fe  01 01 17 c0 c0 c1 00 a6  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200

root@akita:~# ipmitool fru print 0
 Chassis Type          : Rack Mount Chassis
 Chassis Area Checksum : OK


2. Prepare a FRU file with additional data, e.g. with frugy:

(frugy) osk@osk:~$ cat demo.yml
ChassisInfo:
  type: 23
  part_number: '4711'
  serial_number: '12345'

(frugy) osk@osk:~$ frugy demo.yml -o demo.bin

(frugy) osk@osk:~$ ls -al demo.bin
-rw-r----- 1 osk primarygroup 24 May  2 13:51 demo.bin

(frugy) osk@osk:~$ hexdump -C demo.bin
00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31  |............4711|
00000010  c5 31 32 33 34 35 c1 d0                           |.12345..|
00000018

Note that frugy generates a minimal binary by default. However, other
processes may not and instead produce a fixed size binary blob. This
happens, e.g. when performing "ipmitool fru read" which returns the
whole contents of the underlying storage. A subsequent "ipmitool fru
write" with that blob will fail as explained here.

To simulate this here, increase the file to 256 bytes:

(frugy) osk@osk:~$ cp demo.bin demo-256.bin
(frugy) osk@osk:~$ dd if=/dev/zero bs=1 seek=256 count=0 of=demo-256.bin
0+0 records in
0+0 records out
0 bytes copied, 5.1138e-05 s, 0.0 kB/s

(frugy) osk@osk:~$ ls -al demo-256.bin
-rw-r----- 1 osk primarygroup 256 May  2 13:57 demo-256.bin

(frugy) osk@osk:~$ hexdump -C demo-256.bin
00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31  |............4711|
00000010  c5 31 32 33 34 35 c1 d0  00 00 00 00 00 00 00 00  |.12345..........|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100


3. Write the newly generated FRU:

root@akita:~# ipmitool fru write 0 demo-256.bin
Fru Size         : 512 bytes
Size to Write    : 256 bytes

Wait ~10 seconds for the fru-device process to reload the contents.


4. Re-read the FRU contents:

root@akita:~# ipmitool fru print 0
 Chassis Type          : Rack Mount Chassis
 Chassis Area Checksum : OK


5. For comparison, write only the minimal FRU:

root@akita:~# ipmitool fru write 0 demo.bin
Fru Size         : 512 bytes
Size to Write    : 24 bytes

Wait ~10 seconds.

root@akita:~# ipmitool fru print 0
 Chassis Type          : Rack Mount Chassis
 Chassis Part Number   : 4711
 Chassis Serial        : 12345
 Chassis Area Checksum : OK


I dug into this and found that this is caused by an interaction
between https://github.com/openbmc/phosphor-host-ipmid/blob/master/dbus-sdr/storagecommands.cpp
and https://github.com/openbmc/entity-manager/blob/master/src/fru_device.cpp.

Here's what happens:
- The "ipmitool fru write" request is handled by storagecommands.cpp
ipmiStorageWriteFruData.

- ipmiStorageWriteFruData receives the whole FRU blob from IPMI across
several calls. On the initial call, it requests the current FRU blob
via D-bus from fru-device and caches it in the process. It then
overwrites sections of that cache with the received data from IPMI.

- ipmiStorageWriteFruData uses information from the FRU header to
check whether it received all the bytes that make up the new FRU. Note
that this could be fewer bytes than the size of the cached blob. Once
it receives all the bytes for the new FRU, it calls
/xyz/openbmc_project/FruDevice WriteFru via D-Bus with the full FRU
blob (i.e. the full cache with modifications on top). After that the
cache is cleared.

- The D-Bus call to WriteFru is handled by fru_device.cpp writeFRU. It
first performs a sanity check and then writes the blob to the
underlying device (or the /etc/fru/baseboard.fru.bin for FRU 0). It
subsequently calls rescanBusses() which actually executes after about
1 second.

- Meanwhile, "ipmitool fru write" continues to happily send additional
bytes to ipmiStorageWriteFruData (as in step #3 above). Since its
cache has just been cleared, it retrieves the current FRU from
fru-device again. However, since that has not yet completed
"rescanBusses", it actually received the original FRU again, not the
modified version. The above cycle repeats, but this time the original
FRU with additional modifications from the additional bytes.

In the best case (if the new FRU data is larger than the original FRU
data), the result is that the FRU did not get updated at all, since we
effectively just wrote back the original FRU. However, if the new FRU
data is smaller than the original FRU data, this may result in
corrupted FRU data to be persisted.


I was trying to figure out how to best fix that, but couldn't come up
with a design that would still work. Some ideas:

A)  I think what we're missing is feedback from fru-device to ipmid
that the FRU write operation has actually completed and the FRU data
was re-read. The ipmid should probably not accept any additional write
requests until the previous write request has completed and the new
FRU data is available.

B) If ipmiStorageWriteFruData cannot detect the expected size of the
FRU data, it relies on a timeout to eventually write the blob if no
more data was received from IPMI. I wonder if this mechanism is "too
clever" and we should simply always just wait for the timeout?

C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
was called. Maybe we should keep that cache around until the timeout
expires?

Thoughts?

Thanks
Oskar.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ipmitool fru write 0 - does not update "baseboard" FRU
  2024-05-02 18:31 ipmitool fru write 0 - does not update "baseboard" FRU Oskar Senft
@ 2024-05-03 15:38 ` Willy Tu
  2024-05-07 17:32   ` Oskar Senft
  0 siblings, 1 reply; 7+ messages in thread
From: Willy Tu @ 2024-05-03 15:38 UTC (permalink / raw)
  To: Oskar Senft; +Cc: OpenBMC Maillist, Ali El-Haj-Mahmoud, Ed Tanous

[-- Attachment #1: Type: text/plain, Size: 7656 bytes --]

Hi Oskar,

> C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
was called. Maybe we should keep that cache around until the timeout
expires?

It seems like this is an issue of multiple clients taking to ipmid. In the
middle of writing... There is another client that is reading or something
else and will reset the fruCache as you mentioned. In that case, I think it
may be best to refactor it out to use another getFru method... maybe like
getFruToWrite

Maybe just something like this

```
std::vector<uint8_t> getFruToWrite(...){
  if (writeTimer->isRunning()) {
    return fruCacheForWrite;
  }
  auto [_, fruCacheForWrite] = getFru(...);
  return fruCacheForWrite;
}
```

Also need to change `writeFruCache` and such to use `fruCacheForWrite` and
such.

Please let me know if that works for you. Feel free to reach out internally
for faster feedback.

Willy Tu

On Thu, May 2, 2024 at 11:32 AM Oskar Senft <osk@google.com> wrote:

> Hi everyone
>
> tl;dr - Attempting "ipmitool fru write" with an input file that
> contains additional bytes beyond the actual FRU data does not actually
> update the FRU on OpenBMC at head w/ entity-manager.
>
> Details:
>
> I ran into an issue with updating the "baseboard" FRU (0), which is
> stored as /etc/fru/baseboard.fru.bin. However, I don't think this is
> specific to FRU 0 and could apply to other updateable FRUs in the same
> way.
>
>
> 1. Start off with a "minimal" /etc/fru/baseboard.fru.bin which just
> contains chassis type (that's required for entity-manager's fru-device
> to recognize the file.
>
> root@akita:~# ls -al /etc/fru/baseboard.fru.bin
> -rw-r--r--    1 root     root           512 Sep 21 05:21
> /etc/fru/baseboard.fru.bin
>
> root@akita:~# hexdump -C /etc/fru/baseboard.fru.bin
> 00000000  01 00 01 00 00 00 00 fe  01 01 17 c0 c0 c1 00 a6
> |................|
> 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00000200
>
> root@akita:~# ipmitool fru print 0
>  Chassis Type          : Rack Mount Chassis
>  Chassis Area Checksum : OK
>
>
> 2. Prepare a FRU file with additional data, e.g. with frugy:
>
> (frugy) osk@osk:~$ cat demo.yml
> ChassisInfo:
>   type: 23
>   part_number: '4711'
>   serial_number: '12345'
>
> (frugy) osk@osk:~$ frugy demo.yml -o demo.bin
>
> (frugy) osk@osk:~$ ls -al demo.bin
> -rw-r----- 1 osk primarygroup 24 May  2 13:51 demo.bin
>
> (frugy) osk@osk:~$ hexdump -C demo.bin
> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
> |............4711|
> 00000010  c5 31 32 33 34 35 c1 d0                           |.12345..|
> 00000018
>
> Note that frugy generates a minimal binary by default. However, other
> processes may not and instead produce a fixed size binary blob. This
> happens, e.g. when performing "ipmitool fru read" which returns the
> whole contents of the underlying storage. A subsequent "ipmitool fru
> write" with that blob will fail as explained here.
>
> To simulate this here, increase the file to 256 bytes:
>
> (frugy) osk@osk:~$ cp demo.bin demo-256.bin
> (frugy) osk@osk:~$ dd if=/dev/zero bs=1 seek=256 count=0 of=demo-256.bin
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 5.1138e-05 s, 0.0 kB/s
>
> (frugy) osk@osk:~$ ls -al demo-256.bin
> -rw-r----- 1 osk primarygroup 256 May  2 13:57 demo-256.bin
>
> (frugy) osk@osk:~$ hexdump -C demo-256.bin
> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
> |............4711|
> 00000010  c5 31 32 33 34 35 c1 d0  00 00 00 00 00 00 00 00
> |.12345..........|
> 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00000100
>
>
> 3. Write the newly generated FRU:
>
> root@akita:~# ipmitool fru write 0 demo-256.bin
> Fru Size         : 512 bytes
> Size to Write    : 256 bytes
>
> Wait ~10 seconds for the fru-device process to reload the contents.
>
>
> 4. Re-read the FRU contents:
>
> root@akita:~# ipmitool fru print 0
>  Chassis Type          : Rack Mount Chassis
>  Chassis Area Checksum : OK
>
>
> 5. For comparison, write only the minimal FRU:
>
> root@akita:~# ipmitool fru write 0 demo.bin
> Fru Size         : 512 bytes
> Size to Write    : 24 bytes
>
> Wait ~10 seconds.
>
> root@akita:~# ipmitool fru print 0
>  Chassis Type          : Rack Mount Chassis
>  Chassis Part Number   : 4711
>  Chassis Serial        : 12345
>  Chassis Area Checksum : OK
>
>
> I dug into this and found that this is caused by an interaction
> between
> https://github.com/openbmc/phosphor-host-ipmid/blob/master/dbus-sdr/storagecommands.cpp
> and
> https://github.com/openbmc/entity-manager/blob/master/src/fru_device.cpp.
>
> Here's what happens:
> - The "ipmitool fru write" request is handled by storagecommands.cpp
> ipmiStorageWriteFruData.
>
> - ipmiStorageWriteFruData receives the whole FRU blob from IPMI across
> several calls. On the initial call, it requests the current FRU blob
> via D-bus from fru-device and caches it in the process. It then
> overwrites sections of that cache with the received data from IPMI.
>
> - ipmiStorageWriteFruData uses information from the FRU header to
> check whether it received all the bytes that make up the new FRU. Note
> that this could be fewer bytes than the size of the cached blob. Once
> it receives all the bytes for the new FRU, it calls
> /xyz/openbmc_project/FruDevice WriteFru via D-Bus with the full FRU
> blob (i.e. the full cache with modifications on top). After that the
> cache is cleared.
>
> - The D-Bus call to WriteFru is handled by fru_device.cpp writeFRU. It
> first performs a sanity check and then writes the blob to the
> underlying device (or the /etc/fru/baseboard.fru.bin for FRU 0). It
> subsequently calls rescanBusses() which actually executes after about
> 1 second.
>
> - Meanwhile, "ipmitool fru write" continues to happily send additional
> bytes to ipmiStorageWriteFruData (as in step #3 above). Since its
> cache has just been cleared, it retrieves the current FRU from
> fru-device again. However, since that has not yet completed
> "rescanBusses", it actually received the original FRU again, not the
> modified version. The above cycle repeats, but this time the original
> FRU with additional modifications from the additional bytes.
>
> In the best case (if the new FRU data is larger than the original FRU
> data), the result is that the FRU did not get updated at all, since we
> effectively just wrote back the original FRU. However, if the new FRU
> data is smaller than the original FRU data, this may result in
> corrupted FRU data to be persisted.
>
>
> I was trying to figure out how to best fix that, but couldn't come up
> with a design that would still work. Some ideas:
>
> A)  I think what we're missing is feedback from fru-device to ipmid
> that the FRU write operation has actually completed and the FRU data
> was re-read. The ipmid should probably not accept any additional write
> requests until the previous write request has completed and the new
> FRU data is available.
>
> B) If ipmiStorageWriteFruData cannot detect the expected size of the
> FRU data, it relies on a timeout to eventually write the blob if no
> more data was received from IPMI. I wonder if this mechanism is "too
> clever" and we should simply always just wait for the timeout?
>
> C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
> was called. Maybe we should keep that cache around until the timeout
> expires?
>
> Thoughts?
>
> Thanks
> Oskar.
>

[-- Attachment #2: Type: text/html, Size: 9031 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ipmitool fru write 0 - does not update "baseboard" FRU
  2024-05-03 15:38 ` Willy Tu
@ 2024-05-07 17:32   ` Oskar Senft
  2024-05-07 23:49     ` Willy Tu
  0 siblings, 1 reply; 7+ messages in thread
From: Oskar Senft @ 2024-05-07 17:32 UTC (permalink / raw)
  To: Willy Tu; +Cc: OpenBMC Maillist, Ali El-Haj-Mahmoud, Ed Tanous

[-- Attachment #1: Type: text/plain, Size: 9077 bytes --]

Hi Willy

Thanks for your input!

From what I can tell, the current implementation will fail in wondrous ways
if there's more than one IPMI client trying to write FRU at the same time.
The existing getFru guards against changing target devId between calls to
not hand-out the same cache for different requests. However, this will
clearly break when different IPMI clients attempt to write the same or
different FRUs at the same time.

We could argue whether that's a supported use case or if we just assume
that'll never happen ... it does seem like quite a bit of an edge case,
though.

I do see it as an issue if there were multiple clients with only one
writing, but others reading - that'll fail in similarly weird ways.

I'm wondering: Do we want to have the fruCache for _reads_ at all? It seems
actually quite wrong, since subsequent reads for the same FRU would always
return the same result, even if the FRU changed through some other
mechanism.

Let me work on a fix that would use the cache only for writing and would
keep it around until the timeout expired.

Oskar.

On Fri, May 3, 2024 at 11:38 AM Willy Tu <wltu@google.com> wrote:

> Hi Oskar,
>
> > C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
> was called. Maybe we should keep that cache around until the timeout
> expires?
>
> It seems like this is an issue of multiple clients taking to ipmid. In the
> middle of writing... There is another client that is reading or something
> else and will reset the fruCache as you mentioned. In that case, I think
> it may be best to refactor it out to use another getFru method... maybe
> like getFruToWrite
>
> Maybe just something like this
>
> ```
> std::vector<uint8_t> getFruToWrite(...){
>   if (writeTimer->isRunning()) {
>     return fruCacheForWrite;
>   }
>   auto [_, fruCacheForWrite] = getFru(...);
>   return fruCacheForWrite;
> }
> ```
>
> Also need to change `writeFruCache` and such to use `fruCacheForWrite` and
> such.
>
> Please let me know if that works for you. Feel free to reach out
> internally for faster feedback.
>
> Willy Tu
>
> On Thu, May 2, 2024 at 11:32 AM Oskar Senft <osk@google.com> wrote:
>
>> Hi everyone
>>
>> tl;dr - Attempting "ipmitool fru write" with an input file that
>> contains additional bytes beyond the actual FRU data does not actually
>> update the FRU on OpenBMC at head w/ entity-manager.
>>
>> Details:
>>
>> I ran into an issue with updating the "baseboard" FRU (0), which is
>> stored as /etc/fru/baseboard.fru.bin. However, I don't think this is
>> specific to FRU 0 and could apply to other updateable FRUs in the same
>> way.
>>
>>
>> 1. Start off with a "minimal" /etc/fru/baseboard.fru.bin which just
>> contains chassis type (that's required for entity-manager's fru-device
>> to recognize the file.
>>
>> root@akita:~# ls -al /etc/fru/baseboard.fru.bin
>> -rw-r--r--    1 root     root           512 Sep 21 05:21
>> /etc/fru/baseboard.fru.bin
>>
>> root@akita:~# hexdump -C /etc/fru/baseboard.fru.bin
>> 00000000  01 00 01 00 00 00 00 fe  01 01 17 c0 c0 c1 00 a6
>> |................|
>> 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>> |................|
>> *
>> 00000200
>>
>> root@akita:~# ipmitool fru print 0
>>  Chassis Type          : Rack Mount Chassis
>>  Chassis Area Checksum : OK
>>
>>
>> 2. Prepare a FRU file with additional data, e.g. with frugy:
>>
>> (frugy) osk@osk:~$ cat demo.yml
>> ChassisInfo:
>>   type: 23
>>   part_number: '4711'
>>   serial_number: '12345'
>>
>> (frugy) osk@osk:~$ frugy demo.yml -o demo.bin
>>
>> (frugy) osk@osk:~$ ls -al demo.bin
>> -rw-r----- 1 osk primarygroup 24 May  2 13:51 demo.bin
>>
>> (frugy) osk@osk:~$ hexdump -C demo.bin
>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>> |............4711|
>> 00000010  c5 31 32 33 34 35 c1 d0                           |.12345..|
>> 00000018
>>
>> Note that frugy generates a minimal binary by default. However, other
>> processes may not and instead produce a fixed size binary blob. This
>> happens, e.g. when performing "ipmitool fru read" which returns the
>> whole contents of the underlying storage. A subsequent "ipmitool fru
>> write" with that blob will fail as explained here.
>>
>> To simulate this here, increase the file to 256 bytes:
>>
>> (frugy) osk@osk:~$ cp demo.bin demo-256.bin
>> (frugy) osk@osk:~$ dd if=/dev/zero bs=1 seek=256 count=0 of=demo-256.bin
>> 0+0 records in
>> 0+0 records out
>> 0 bytes copied, 5.1138e-05 s, 0.0 kB/s
>>
>> (frugy) osk@osk:~$ ls -al demo-256.bin
>> -rw-r----- 1 osk primarygroup 256 May  2 13:57 demo-256.bin
>>
>> (frugy) osk@osk:~$ hexdump -C demo-256.bin
>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>> |............4711|
>> 00000010  c5 31 32 33 34 35 c1 d0  00 00 00 00 00 00 00 00
>> |.12345..........|
>> 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>> |................|
>> *
>> 00000100
>>
>>
>> 3. Write the newly generated FRU:
>>
>> root@akita:~# ipmitool fru write 0 demo-256.bin
>> Fru Size         : 512 bytes
>> Size to Write    : 256 bytes
>>
>> Wait ~10 seconds for the fru-device process to reload the contents.
>>
>>
>> 4. Re-read the FRU contents:
>>
>> root@akita:~# ipmitool fru print 0
>>  Chassis Type          : Rack Mount Chassis
>>  Chassis Area Checksum : OK
>>
>>
>> 5. For comparison, write only the minimal FRU:
>>
>> root@akita:~# ipmitool fru write 0 demo.bin
>> Fru Size         : 512 bytes
>> Size to Write    : 24 bytes
>>
>> Wait ~10 seconds.
>>
>> root@akita:~# ipmitool fru print 0
>>  Chassis Type          : Rack Mount Chassis
>>  Chassis Part Number   : 4711
>>  Chassis Serial        : 12345
>>  Chassis Area Checksum : OK
>>
>>
>> I dug into this and found that this is caused by an interaction
>> between
>> https://github.com/openbmc/phosphor-host-ipmid/blob/master/dbus-sdr/storagecommands.cpp
>> and
>> https://github.com/openbmc/entity-manager/blob/master/src/fru_device.cpp.
>>
>> Here's what happens:
>> - The "ipmitool fru write" request is handled by storagecommands.cpp
>> ipmiStorageWriteFruData.
>>
>> - ipmiStorageWriteFruData receives the whole FRU blob from IPMI across
>> several calls. On the initial call, it requests the current FRU blob
>> via D-bus from fru-device and caches it in the process. It then
>> overwrites sections of that cache with the received data from IPMI.
>>
>> - ipmiStorageWriteFruData uses information from the FRU header to
>> check whether it received all the bytes that make up the new FRU. Note
>> that this could be fewer bytes than the size of the cached blob. Once
>> it receives all the bytes for the new FRU, it calls
>> /xyz/openbmc_project/FruDevice WriteFru via D-Bus with the full FRU
>> blob (i.e. the full cache with modifications on top). After that the
>> cache is cleared.
>>
>> - The D-Bus call to WriteFru is handled by fru_device.cpp writeFRU. It
>> first performs a sanity check and then writes the blob to the
>> underlying device (or the /etc/fru/baseboard.fru.bin for FRU 0). It
>> subsequently calls rescanBusses() which actually executes after about
>> 1 second.
>>
>> - Meanwhile, "ipmitool fru write" continues to happily send additional
>> bytes to ipmiStorageWriteFruData (as in step #3 above). Since its
>> cache has just been cleared, it retrieves the current FRU from
>> fru-device again. However, since that has not yet completed
>> "rescanBusses", it actually received the original FRU again, not the
>> modified version. The above cycle repeats, but this time the original
>> FRU with additional modifications from the additional bytes.
>>
>> In the best case (if the new FRU data is larger than the original FRU
>> data), the result is that the FRU did not get updated at all, since we
>> effectively just wrote back the original FRU. However, if the new FRU
>> data is smaller than the original FRU data, this may result in
>> corrupted FRU data to be persisted.
>>
>>
>> I was trying to figure out how to best fix that, but couldn't come up
>> with a design that would still work. Some ideas:
>>
>> A)  I think what we're missing is feedback from fru-device to ipmid
>> that the FRU write operation has actually completed and the FRU data
>> was re-read. The ipmid should probably not accept any additional write
>> requests until the previous write request has completed and the new
>> FRU data is available.
>>
>> B) If ipmiStorageWriteFruData cannot detect the expected size of the
>> FRU data, it relies on a timeout to eventually write the blob if no
>> more data was received from IPMI. I wonder if this mechanism is "too
>> clever" and we should simply always just wait for the timeout?
>>
>> C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
>> was called. Maybe we should keep that cache around until the timeout
>> expires?
>>
>> Thoughts?
>>
>> Thanks
>> Oskar.
>>
>

[-- Attachment #2: Type: text/html, Size: 10664 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ipmitool fru write 0 - does not update "baseboard" FRU
  2024-05-07 17:32   ` Oskar Senft
@ 2024-05-07 23:49     ` Willy Tu
  2024-05-08  0:04       ` Oskar Senft
  0 siblings, 1 reply; 7+ messages in thread
From: Willy Tu @ 2024-05-07 23:49 UTC (permalink / raw)
  To: Oskar Senft; +Cc: OpenBMC Maillist, Ali El-Haj-Mahmoud, Ed Tanous

[-- Attachment #1: Type: text/plain, Size: 9906 bytes --]

I think we should make it so that if a different IPMI client tries to write
the Fru we prevent it and only allow one write at a time. I think the
fruCache for Read is mostly from some commands that is dealing with one
device ID multiple times. When reading subsequent ids... then it doesn't do
anything.

I think we will need more discussion on that topic since it will be a
larger refactor to make that work.

Sounds good. Please let me know how it goes.

Willy Tu

On Tue, May 7, 2024 at 10:32 AM Oskar Senft <osk@google.com> wrote:

> Hi Willy
>
> Thanks for your input!
>
> From what I can tell, the current implementation will fail in wondrous
> ways if there's more than one IPMI client trying to write FRU at the same
> time. The existing getFru guards against changing target devId between
> calls to not hand-out the same cache for different requests. However, this
> will clearly break when different IPMI clients attempt to write the same or
> different FRUs at the same time.
>
> We could argue whether that's a supported use case or if we just assume
> that'll never happen ... it does seem like quite a bit of an edge case,
> though.
>
> I do see it as an issue if there were multiple clients with only one
> writing, but others reading - that'll fail in similarly weird ways.
>
> I'm wondering: Do we want to have the fruCache for _reads_ at all? It
> seems actually quite wrong, since subsequent reads for the same FRU would
> always return the same result, even if the FRU changed through some other
> mechanism.
>
> Let me work on a fix that would use the cache only for writing and would
> keep it around until the timeout expired.
>
> Oskar.
>
> On Fri, May 3, 2024 at 11:38 AM Willy Tu <wltu@google.com> wrote:
>
>> Hi Oskar,
>>
>> > C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
>> was called. Maybe we should keep that cache around until the timeout
>> expires?
>>
>> It seems like this is an issue of multiple clients taking to ipmid. In
>> the middle of writing... There is another client that is reading or
>> something
>> else and will reset the fruCache as you mentioned. In that case, I think
>> it may be best to refactor it out to use another getFru method... maybe
>> like getFruToWrite
>>
>> Maybe just something like this
>>
>> ```
>> std::vector<uint8_t> getFruToWrite(...){
>>   if (writeTimer->isRunning()) {
>>     return fruCacheForWrite;
>>   }
>>   auto [_, fruCacheForWrite] = getFru(...);
>>   return fruCacheForWrite;
>> }
>> ```
>>
>> Also need to change `writeFruCache` and such to use `fruCacheForWrite`
>> and such.
>>
>> Please let me know if that works for you. Feel free to reach out
>> internally for faster feedback.
>>
>> Willy Tu
>>
>> On Thu, May 2, 2024 at 11:32 AM Oskar Senft <osk@google.com> wrote:
>>
>>> Hi everyone
>>>
>>> tl;dr - Attempting "ipmitool fru write" with an input file that
>>> contains additional bytes beyond the actual FRU data does not actually
>>> update the FRU on OpenBMC at head w/ entity-manager.
>>>
>>> Details:
>>>
>>> I ran into an issue with updating the "baseboard" FRU (0), which is
>>> stored as /etc/fru/baseboard.fru.bin. However, I don't think this is
>>> specific to FRU 0 and could apply to other updateable FRUs in the same
>>> way.
>>>
>>>
>>> 1. Start off with a "minimal" /etc/fru/baseboard.fru.bin which just
>>> contains chassis type (that's required for entity-manager's fru-device
>>> to recognize the file.
>>>
>>> root@akita:~# ls -al /etc/fru/baseboard.fru.bin
>>> -rw-r--r--    1 root     root           512 Sep 21 05:21
>>> /etc/fru/baseboard.fru.bin
>>>
>>> root@akita:~# hexdump -C /etc/fru/baseboard.fru.bin
>>> 00000000  01 00 01 00 00 00 00 fe  01 01 17 c0 c0 c1 00 a6
>>> |................|
>>> 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>> |................|
>>> *
>>> 00000200
>>>
>>> root@akita:~# ipmitool fru print 0
>>>  Chassis Type          : Rack Mount Chassis
>>>  Chassis Area Checksum : OK
>>>
>>>
>>> 2. Prepare a FRU file with additional data, e.g. with frugy:
>>>
>>> (frugy) osk@osk:~$ cat demo.yml
>>> ChassisInfo:
>>>   type: 23
>>>   part_number: '4711'
>>>   serial_number: '12345'
>>>
>>> (frugy) osk@osk:~$ frugy demo.yml -o demo.bin
>>>
>>> (frugy) osk@osk:~$ ls -al demo.bin
>>> -rw-r----- 1 osk primarygroup 24 May  2 13:51 demo.bin
>>>
>>> (frugy) osk@osk:~$ hexdump -C demo.bin
>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>> |............4711|
>>> 00000010  c5 31 32 33 34 35 c1 d0                           |.12345..|
>>> 00000018
>>>
>>> Note that frugy generates a minimal binary by default. However, other
>>> processes may not and instead produce a fixed size binary blob. This
>>> happens, e.g. when performing "ipmitool fru read" which returns the
>>> whole contents of the underlying storage. A subsequent "ipmitool fru
>>> write" with that blob will fail as explained here.
>>>
>>> To simulate this here, increase the file to 256 bytes:
>>>
>>> (frugy) osk@osk:~$ cp demo.bin demo-256.bin
>>> (frugy) osk@osk:~$ dd if=/dev/zero bs=1 seek=256 count=0 of=demo-256.bin
>>> 0+0 records in
>>> 0+0 records out
>>> 0 bytes copied, 5.1138e-05 s, 0.0 kB/s
>>>
>>> (frugy) osk@osk:~$ ls -al demo-256.bin
>>> -rw-r----- 1 osk primarygroup 256 May  2 13:57 demo-256.bin
>>>
>>> (frugy) osk@osk:~$ hexdump -C demo-256.bin
>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>> |............4711|
>>> 00000010  c5 31 32 33 34 35 c1 d0  00 00 00 00 00 00 00 00
>>> |.12345..........|
>>> 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>> |................|
>>> *
>>> 00000100
>>>
>>>
>>> 3. Write the newly generated FRU:
>>>
>>> root@akita:~# ipmitool fru write 0 demo-256.bin
>>> Fru Size         : 512 bytes
>>> Size to Write    : 256 bytes
>>>
>>> Wait ~10 seconds for the fru-device process to reload the contents.
>>>
>>>
>>> 4. Re-read the FRU contents:
>>>
>>> root@akita:~# ipmitool fru print 0
>>>  Chassis Type          : Rack Mount Chassis
>>>  Chassis Area Checksum : OK
>>>
>>>
>>> 5. For comparison, write only the minimal FRU:
>>>
>>> root@akita:~# ipmitool fru write 0 demo.bin
>>> Fru Size         : 512 bytes
>>> Size to Write    : 24 bytes
>>>
>>> Wait ~10 seconds.
>>>
>>> root@akita:~# ipmitool fru print 0
>>>  Chassis Type          : Rack Mount Chassis
>>>  Chassis Part Number   : 4711
>>>  Chassis Serial        : 12345
>>>  Chassis Area Checksum : OK
>>>
>>>
>>> I dug into this and found that this is caused by an interaction
>>> between
>>> https://github.com/openbmc/phosphor-host-ipmid/blob/master/dbus-sdr/storagecommands.cpp
>>> and
>>> https://github.com/openbmc/entity-manager/blob/master/src/fru_device.cpp
>>> .
>>>
>>> Here's what happens:
>>> - The "ipmitool fru write" request is handled by storagecommands.cpp
>>> ipmiStorageWriteFruData.
>>>
>>> - ipmiStorageWriteFruData receives the whole FRU blob from IPMI across
>>> several calls. On the initial call, it requests the current FRU blob
>>> via D-bus from fru-device and caches it in the process. It then
>>> overwrites sections of that cache with the received data from IPMI.
>>>
>>> - ipmiStorageWriteFruData uses information from the FRU header to
>>> check whether it received all the bytes that make up the new FRU. Note
>>> that this could be fewer bytes than the size of the cached blob. Once
>>> it receives all the bytes for the new FRU, it calls
>>> /xyz/openbmc_project/FruDevice WriteFru via D-Bus with the full FRU
>>> blob (i.e. the full cache with modifications on top). After that the
>>> cache is cleared.
>>>
>>> - The D-Bus call to WriteFru is handled by fru_device.cpp writeFRU. It
>>> first performs a sanity check and then writes the blob to the
>>> underlying device (or the /etc/fru/baseboard.fru.bin for FRU 0). It
>>> subsequently calls rescanBusses() which actually executes after about
>>> 1 second.
>>>
>>> - Meanwhile, "ipmitool fru write" continues to happily send additional
>>> bytes to ipmiStorageWriteFruData (as in step #3 above). Since its
>>> cache has just been cleared, it retrieves the current FRU from
>>> fru-device again. However, since that has not yet completed
>>> "rescanBusses", it actually received the original FRU again, not the
>>> modified version. The above cycle repeats, but this time the original
>>> FRU with additional modifications from the additional bytes.
>>>
>>> In the best case (if the new FRU data is larger than the original FRU
>>> data), the result is that the FRU did not get updated at all, since we
>>> effectively just wrote back the original FRU. However, if the new FRU
>>> data is smaller than the original FRU data, this may result in
>>> corrupted FRU data to be persisted.
>>>
>>>
>>> I was trying to figure out how to best fix that, but couldn't come up
>>> with a design that would still work. Some ideas:
>>>
>>> A)  I think what we're missing is feedback from fru-device to ipmid
>>> that the FRU write operation has actually completed and the FRU data
>>> was re-read. The ipmid should probably not accept any additional write
>>> requests until the previous write request has completed and the new
>>> FRU data is available.
>>>
>>> B) If ipmiStorageWriteFruData cannot detect the expected size of the
>>> FRU data, it relies on a timeout to eventually write the blob if no
>>> more data was received from IPMI. I wonder if this mechanism is "too
>>> clever" and we should simply always just wait for the timeout?
>>>
>>> C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
>>> was called. Maybe we should keep that cache around until the timeout
>>> expires?
>>>
>>> Thoughts?
>>>
>>> Thanks
>>> Oskar.
>>>
>>

[-- Attachment #2: Type: text/html, Size: 11568 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ipmitool fru write 0 - does not update "baseboard" FRU
  2024-05-07 23:49     ` Willy Tu
@ 2024-05-08  0:04       ` Oskar Senft
  2024-05-09 16:17         ` Willy Tu
  0 siblings, 1 reply; 7+ messages in thread
From: Oskar Senft @ 2024-05-08  0:04 UTC (permalink / raw)
  To: Willy Tu; +Cc: OpenBMC Maillist, Ali El-Haj-Mahmoud, Ed Tanous

[-- Attachment #1: Type: text/plain, Size: 10677 bytes --]

I got some code changes that look right, but haven't done much testing yet.
I realized that a read cache is useful, so I kept it. I split read and
write cache by defining a new class that keeps all the related data
together.

Any suggestions on how to distinguish different clients? Or maybe we just
error out when we receive a request for a different FRU while we're still
not done with the first one?

Oskar.

On Tue, May 7, 2024 at 7:49 PM Willy Tu <wltu@google.com> wrote:

> I think we should make it so that if a different IPMI client tries to
> write the Fru we prevent it and only allow one write at a time. I think the
> fruCache for Read is mostly from some commands that is dealing with one
> device ID multiple times. When reading subsequent ids... then it doesn't do
> anything.
>
> I think we will need more discussion on that topic since it will be a
> larger refactor to make that work.
>
> Sounds good. Please let me know how it goes.
>
> Willy Tu
>
> On Tue, May 7, 2024 at 10:32 AM Oskar Senft <osk@google.com> wrote:
>
>> Hi Willy
>>
>> Thanks for your input!
>>
>> From what I can tell, the current implementation will fail in wondrous
>> ways if there's more than one IPMI client trying to write FRU at the same
>> time. The existing getFru guards against changing target devId between
>> calls to not hand-out the same cache for different requests. However, this
>> will clearly break when different IPMI clients attempt to write the same or
>> different FRUs at the same time.
>>
>> We could argue whether that's a supported use case or if we just assume
>> that'll never happen ... it does seem like quite a bit of an edge case,
>> though.
>>
>> I do see it as an issue if there were multiple clients with only one
>> writing, but others reading - that'll fail in similarly weird ways.
>>
>> I'm wondering: Do we want to have the fruCache for _reads_ at all? It
>> seems actually quite wrong, since subsequent reads for the same FRU would
>> always return the same result, even if the FRU changed through some other
>> mechanism.
>>
>> Let me work on a fix that would use the cache only for writing and would
>> keep it around until the timeout expired.
>>
>> Oskar.
>>
>> On Fri, May 3, 2024 at 11:38 AM Willy Tu <wltu@google.com> wrote:
>>
>>> Hi Oskar,
>>>
>>> > C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
>>> was called. Maybe we should keep that cache around until the timeout
>>> expires?
>>>
>>> It seems like this is an issue of multiple clients taking to ipmid. In
>>> the middle of writing... There is another client that is reading or
>>> something
>>> else and will reset the fruCache as you mentioned. In that case, I think
>>> it may be best to refactor it out to use another getFru method... maybe
>>> like getFruToWrite
>>>
>>> Maybe just something like this
>>>
>>> ```
>>> std::vector<uint8_t> getFruToWrite(...){
>>>   if (writeTimer->isRunning()) {
>>>     return fruCacheForWrite;
>>>   }
>>>   auto [_, fruCacheForWrite] = getFru(...);
>>>   return fruCacheForWrite;
>>> }
>>> ```
>>>
>>> Also need to change `writeFruCache` and such to use `fruCacheForWrite`
>>> and such.
>>>
>>> Please let me know if that works for you. Feel free to reach out
>>> internally for faster feedback.
>>>
>>> Willy Tu
>>>
>>> On Thu, May 2, 2024 at 11:32 AM Oskar Senft <osk@google.com> wrote:
>>>
>>>> Hi everyone
>>>>
>>>> tl;dr - Attempting "ipmitool fru write" with an input file that
>>>> contains additional bytes beyond the actual FRU data does not actually
>>>> update the FRU on OpenBMC at head w/ entity-manager.
>>>>
>>>> Details:
>>>>
>>>> I ran into an issue with updating the "baseboard" FRU (0), which is
>>>> stored as /etc/fru/baseboard.fru.bin. However, I don't think this is
>>>> specific to FRU 0 and could apply to other updateable FRUs in the same
>>>> way.
>>>>
>>>>
>>>> 1. Start off with a "minimal" /etc/fru/baseboard.fru.bin which just
>>>> contains chassis type (that's required for entity-manager's fru-device
>>>> to recognize the file.
>>>>
>>>> root@akita:~# ls -al /etc/fru/baseboard.fru.bin
>>>> -rw-r--r--    1 root     root           512 Sep 21 05:21
>>>> /etc/fru/baseboard.fru.bin
>>>>
>>>> root@akita:~# hexdump -C /etc/fru/baseboard.fru.bin
>>>> 00000000  01 00 01 00 00 00 00 fe  01 01 17 c0 c0 c1 00 a6
>>>> |................|
>>>> 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>> |................|
>>>> *
>>>> 00000200
>>>>
>>>> root@akita:~# ipmitool fru print 0
>>>>  Chassis Type          : Rack Mount Chassis
>>>>  Chassis Area Checksum : OK
>>>>
>>>>
>>>> 2. Prepare a FRU file with additional data, e.g. with frugy:
>>>>
>>>> (frugy) osk@osk:~$ cat demo.yml
>>>> ChassisInfo:
>>>>   type: 23
>>>>   part_number: '4711'
>>>>   serial_number: '12345'
>>>>
>>>> (frugy) osk@osk:~$ frugy demo.yml -o demo.bin
>>>>
>>>> (frugy) osk@osk:~$ ls -al demo.bin
>>>> -rw-r----- 1 osk primarygroup 24 May  2 13:51 demo.bin
>>>>
>>>> (frugy) osk@osk:~$ hexdump -C demo.bin
>>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>>> |............4711|
>>>> 00000010  c5 31 32 33 34 35 c1 d0                           |.12345..|
>>>> 00000018
>>>>
>>>> Note that frugy generates a minimal binary by default. However, other
>>>> processes may not and instead produce a fixed size binary blob. This
>>>> happens, e.g. when performing "ipmitool fru read" which returns the
>>>> whole contents of the underlying storage. A subsequent "ipmitool fru
>>>> write" with that blob will fail as explained here.
>>>>
>>>> To simulate this here, increase the file to 256 bytes:
>>>>
>>>> (frugy) osk@osk:~$ cp demo.bin demo-256.bin
>>>> (frugy) osk@osk:~$ dd if=/dev/zero bs=1 seek=256 count=0
>>>> of=demo-256.bin
>>>> 0+0 records in
>>>> 0+0 records out
>>>> 0 bytes copied, 5.1138e-05 s, 0.0 kB/s
>>>>
>>>> (frugy) osk@osk:~$ ls -al demo-256.bin
>>>> -rw-r----- 1 osk primarygroup 256 May  2 13:57 demo-256.bin
>>>>
>>>> (frugy) osk@osk:~$ hexdump -C demo-256.bin
>>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>>> |............4711|
>>>> 00000010  c5 31 32 33 34 35 c1 d0  00 00 00 00 00 00 00 00
>>>> |.12345..........|
>>>> 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>> |................|
>>>> *
>>>> 00000100
>>>>
>>>>
>>>> 3. Write the newly generated FRU:
>>>>
>>>> root@akita:~# ipmitool fru write 0 demo-256.bin
>>>> Fru Size         : 512 bytes
>>>> Size to Write    : 256 bytes
>>>>
>>>> Wait ~10 seconds for the fru-device process to reload the contents.
>>>>
>>>>
>>>> 4. Re-read the FRU contents:
>>>>
>>>> root@akita:~# ipmitool fru print 0
>>>>  Chassis Type          : Rack Mount Chassis
>>>>  Chassis Area Checksum : OK
>>>>
>>>>
>>>> 5. For comparison, write only the minimal FRU:
>>>>
>>>> root@akita:~# ipmitool fru write 0 demo.bin
>>>> Fru Size         : 512 bytes
>>>> Size to Write    : 24 bytes
>>>>
>>>> Wait ~10 seconds.
>>>>
>>>> root@akita:~# ipmitool fru print 0
>>>>  Chassis Type          : Rack Mount Chassis
>>>>  Chassis Part Number   : 4711
>>>>  Chassis Serial        : 12345
>>>>  Chassis Area Checksum : OK
>>>>
>>>>
>>>> I dug into this and found that this is caused by an interaction
>>>> between
>>>> https://github.com/openbmc/phosphor-host-ipmid/blob/master/dbus-sdr/storagecommands.cpp
>>>> and
>>>> https://github.com/openbmc/entity-manager/blob/master/src/fru_device.cpp
>>>> .
>>>>
>>>> Here's what happens:
>>>> - The "ipmitool fru write" request is handled by storagecommands.cpp
>>>> ipmiStorageWriteFruData.
>>>>
>>>> - ipmiStorageWriteFruData receives the whole FRU blob from IPMI across
>>>> several calls. On the initial call, it requests the current FRU blob
>>>> via D-bus from fru-device and caches it in the process. It then
>>>> overwrites sections of that cache with the received data from IPMI.
>>>>
>>>> - ipmiStorageWriteFruData uses information from the FRU header to
>>>> check whether it received all the bytes that make up the new FRU. Note
>>>> that this could be fewer bytes than the size of the cached blob. Once
>>>> it receives all the bytes for the new FRU, it calls
>>>> /xyz/openbmc_project/FruDevice WriteFru via D-Bus with the full FRU
>>>> blob (i.e. the full cache with modifications on top). After that the
>>>> cache is cleared.
>>>>
>>>> - The D-Bus call to WriteFru is handled by fru_device.cpp writeFRU. It
>>>> first performs a sanity check and then writes the blob to the
>>>> underlying device (or the /etc/fru/baseboard.fru.bin for FRU 0). It
>>>> subsequently calls rescanBusses() which actually executes after about
>>>> 1 second.
>>>>
>>>> - Meanwhile, "ipmitool fru write" continues to happily send additional
>>>> bytes to ipmiStorageWriteFruData (as in step #3 above). Since its
>>>> cache has just been cleared, it retrieves the current FRU from
>>>> fru-device again. However, since that has not yet completed
>>>> "rescanBusses", it actually received the original FRU again, not the
>>>> modified version. The above cycle repeats, but this time the original
>>>> FRU with additional modifications from the additional bytes.
>>>>
>>>> In the best case (if the new FRU data is larger than the original FRU
>>>> data), the result is that the FRU did not get updated at all, since we
>>>> effectively just wrote back the original FRU. However, if the new FRU
>>>> data is smaller than the original FRU data, this may result in
>>>> corrupted FRU data to be persisted.
>>>>
>>>>
>>>> I was trying to figure out how to best fix that, but couldn't come up
>>>> with a design that would still work. Some ideas:
>>>>
>>>> A)  I think what we're missing is feedback from fru-device to ipmid
>>>> that the FRU write operation has actually completed and the FRU data
>>>> was re-read. The ipmid should probably not accept any additional write
>>>> requests until the previous write request has completed and the new
>>>> FRU data is available.
>>>>
>>>> B) If ipmiStorageWriteFruData cannot detect the expected size of the
>>>> FRU data, it relies on a timeout to eventually write the blob if no
>>>> more data was received from IPMI. I wonder if this mechanism is "too
>>>> clever" and we should simply always just wait for the timeout?
>>>>
>>>> C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
>>>> was called. Maybe we should keep that cache around until the timeout
>>>> expires?
>>>>
>>>> Thoughts?
>>>>
>>>> Thanks
>>>> Oskar.
>>>>
>>>

[-- Attachment #2: Type: text/html, Size: 12395 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ipmitool fru write 0 - does not update "baseboard" FRU
  2024-05-08  0:04       ` Oskar Senft
@ 2024-05-09 16:17         ` Willy Tu
  2024-05-20 13:03           ` Oskar Senft
  0 siblings, 1 reply; 7+ messages in thread
From: Willy Tu @ 2024-05-09 16:17 UTC (permalink / raw)
  To: Oskar Senft; +Cc: OpenBMC Maillist, Ali El-Haj-Mahmoud

[-- Attachment #1: Type: text/plain, Size: 11159 bytes --]

I am not aware of a way to differentiate between the clients. Might just
have to error out as you suggested.

Willy Tu

On Tue, May 7, 2024 at 5:05 PM Oskar Senft <osk@google.com> wrote:

> I got some code changes that look right, but haven't done much testing
> yet. I realized that a read cache is useful, so I kept it. I split read and
> write cache by defining a new class that keeps all the related data
> together.
>
> Any suggestions on how to distinguish different clients? Or maybe we just
> error out when we receive a request for a different FRU while we're still
> not done with the first one?
>
> Oskar.
>
> On Tue, May 7, 2024 at 7:49 PM Willy Tu <wltu@google.com> wrote:
>
>> I think we should make it so that if a different IPMI client tries to
>> write the Fru we prevent it and only allow one write at a time. I think the
>> fruCache for Read is mostly from some commands that is dealing with one
>> device ID multiple times. When reading subsequent ids... then it doesn't do
>> anything.
>>
>> I think we will need more discussion on that topic since it will be a
>> larger refactor to make that work.
>>
>> Sounds good. Please let me know how it goes.
>>
>> Willy Tu
>>
>> On Tue, May 7, 2024 at 10:32 AM Oskar Senft <osk@google.com> wrote:
>>
>>> Hi Willy
>>>
>>> Thanks for your input!
>>>
>>> From what I can tell, the current implementation will fail in wondrous
>>> ways if there's more than one IPMI client trying to write FRU at the same
>>> time. The existing getFru guards against changing target devId between
>>> calls to not hand-out the same cache for different requests. However, this
>>> will clearly break when different IPMI clients attempt to write the same or
>>> different FRUs at the same time.
>>>
>>> We could argue whether that's a supported use case or if we just assume
>>> that'll never happen ... it does seem like quite a bit of an edge case,
>>> though.
>>>
>>> I do see it as an issue if there were multiple clients with only one
>>> writing, but others reading - that'll fail in similarly weird ways.
>>>
>>> I'm wondering: Do we want to have the fruCache for _reads_ at all? It
>>> seems actually quite wrong, since subsequent reads for the same FRU would
>>> always return the same result, even if the FRU changed through some other
>>> mechanism.
>>>
>>> Let me work on a fix that would use the cache only for writing and would
>>> keep it around until the timeout expired.
>>>
>>> Oskar.
>>>
>>> On Fri, May 3, 2024 at 11:38 AM Willy Tu <wltu@google.com> wrote:
>>>
>>>> Hi Oskar,
>>>>
>>>> > C) ipmiStorageWriteFruData clears the cache immediately after
>>>> WriteFru was called. Maybe we should keep that cache around until the
>>>> timeout expires?
>>>>
>>>> It seems like this is an issue of multiple clients taking to ipmid. In
>>>> the middle of writing... There is another client that is reading or
>>>> something
>>>> else and will reset the fruCache as you mentioned. In that case, I
>>>> think it may be best to refactor it out to use another getFru method...
>>>> maybe like getFruToWrite
>>>>
>>>> Maybe just something like this
>>>>
>>>> ```
>>>> std::vector<uint8_t> getFruToWrite(...){
>>>>   if (writeTimer->isRunning()) {
>>>>     return fruCacheForWrite;
>>>>   }
>>>>   auto [_, fruCacheForWrite] = getFru(...);
>>>>   return fruCacheForWrite;
>>>> }
>>>> ```
>>>>
>>>> Also need to change `writeFruCache` and such to use `fruCacheForWrite`
>>>> and such.
>>>>
>>>> Please let me know if that works for you. Feel free to reach out
>>>> internally for faster feedback.
>>>>
>>>> Willy Tu
>>>>
>>>> On Thu, May 2, 2024 at 11:32 AM Oskar Senft <osk@google.com> wrote:
>>>>
>>>>> Hi everyone
>>>>>
>>>>> tl;dr - Attempting "ipmitool fru write" with an input file that
>>>>> contains additional bytes beyond the actual FRU data does not actually
>>>>> update the FRU on OpenBMC at head w/ entity-manager.
>>>>>
>>>>> Details:
>>>>>
>>>>> I ran into an issue with updating the "baseboard" FRU (0), which is
>>>>> stored as /etc/fru/baseboard.fru.bin. However, I don't think this is
>>>>> specific to FRU 0 and could apply to other updateable FRUs in the same
>>>>> way.
>>>>>
>>>>>
>>>>> 1. Start off with a "minimal" /etc/fru/baseboard.fru.bin which just
>>>>> contains chassis type (that's required for entity-manager's fru-device
>>>>> to recognize the file.
>>>>>
>>>>> root@akita:~# ls -al /etc/fru/baseboard.fru.bin
>>>>> -rw-r--r--    1 root     root           512 Sep 21 05:21
>>>>> /etc/fru/baseboard.fru.bin
>>>>>
>>>>> root@akita:~# hexdump -C /etc/fru/baseboard.fru.bin
>>>>> 00000000  01 00 01 00 00 00 00 fe  01 01 17 c0 c0 c1 00 a6
>>>>> |................|
>>>>> 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> |................|
>>>>> *
>>>>> 00000200
>>>>>
>>>>> root@akita:~# ipmitool fru print 0
>>>>>  Chassis Type          : Rack Mount Chassis
>>>>>  Chassis Area Checksum : OK
>>>>>
>>>>>
>>>>> 2. Prepare a FRU file with additional data, e.g. with frugy:
>>>>>
>>>>> (frugy) osk@osk:~$ cat demo.yml
>>>>> ChassisInfo:
>>>>>   type: 23
>>>>>   part_number: '4711'
>>>>>   serial_number: '12345'
>>>>>
>>>>> (frugy) osk@osk:~$ frugy demo.yml -o demo.bin
>>>>>
>>>>> (frugy) osk@osk:~$ ls -al demo.bin
>>>>> -rw-r----- 1 osk primarygroup 24 May  2 13:51 demo.bin
>>>>>
>>>>> (frugy) osk@osk:~$ hexdump -C demo.bin
>>>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>>>> |............4711|
>>>>> 00000010  c5 31 32 33 34 35 c1 d0                           |.12345..|
>>>>> 00000018
>>>>>
>>>>> Note that frugy generates a minimal binary by default. However, other
>>>>> processes may not and instead produce a fixed size binary blob. This
>>>>> happens, e.g. when performing "ipmitool fru read" which returns the
>>>>> whole contents of the underlying storage. A subsequent "ipmitool fru
>>>>> write" with that blob will fail as explained here.
>>>>>
>>>>> To simulate this here, increase the file to 256 bytes:
>>>>>
>>>>> (frugy) osk@osk:~$ cp demo.bin demo-256.bin
>>>>> (frugy) osk@osk:~$ dd if=/dev/zero bs=1 seek=256 count=0
>>>>> of=demo-256.bin
>>>>> 0+0 records in
>>>>> 0+0 records out
>>>>> 0 bytes copied, 5.1138e-05 s, 0.0 kB/s
>>>>>
>>>>> (frugy) osk@osk:~$ ls -al demo-256.bin
>>>>> -rw-r----- 1 osk primarygroup 256 May  2 13:57 demo-256.bin
>>>>>
>>>>> (frugy) osk@osk:~$ hexdump -C demo-256.bin
>>>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>>>> |............4711|
>>>>> 00000010  c5 31 32 33 34 35 c1 d0  00 00 00 00 00 00 00 00
>>>>> |.12345..........|
>>>>> 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> |................|
>>>>> *
>>>>> 00000100
>>>>>
>>>>>
>>>>> 3. Write the newly generated FRU:
>>>>>
>>>>> root@akita:~# ipmitool fru write 0 demo-256.bin
>>>>> Fru Size         : 512 bytes
>>>>> Size to Write    : 256 bytes
>>>>>
>>>>> Wait ~10 seconds for the fru-device process to reload the contents.
>>>>>
>>>>>
>>>>> 4. Re-read the FRU contents:
>>>>>
>>>>> root@akita:~# ipmitool fru print 0
>>>>>  Chassis Type          : Rack Mount Chassis
>>>>>  Chassis Area Checksum : OK
>>>>>
>>>>>
>>>>> 5. For comparison, write only the minimal FRU:
>>>>>
>>>>> root@akita:~# ipmitool fru write 0 demo.bin
>>>>> Fru Size         : 512 bytes
>>>>> Size to Write    : 24 bytes
>>>>>
>>>>> Wait ~10 seconds.
>>>>>
>>>>> root@akita:~# ipmitool fru print 0
>>>>>  Chassis Type          : Rack Mount Chassis
>>>>>  Chassis Part Number   : 4711
>>>>>  Chassis Serial        : 12345
>>>>>  Chassis Area Checksum : OK
>>>>>
>>>>>
>>>>> I dug into this and found that this is caused by an interaction
>>>>> between
>>>>> https://github.com/openbmc/phosphor-host-ipmid/blob/master/dbus-sdr/storagecommands.cpp
>>>>> and
>>>>> https://github.com/openbmc/entity-manager/blob/master/src/fru_device.cpp
>>>>> .
>>>>>
>>>>> Here's what happens:
>>>>> - The "ipmitool fru write" request is handled by storagecommands.cpp
>>>>> ipmiStorageWriteFruData.
>>>>>
>>>>> - ipmiStorageWriteFruData receives the whole FRU blob from IPMI across
>>>>> several calls. On the initial call, it requests the current FRU blob
>>>>> via D-bus from fru-device and caches it in the process. It then
>>>>> overwrites sections of that cache with the received data from IPMI.
>>>>>
>>>>> - ipmiStorageWriteFruData uses information from the FRU header to
>>>>> check whether it received all the bytes that make up the new FRU. Note
>>>>> that this could be fewer bytes than the size of the cached blob. Once
>>>>> it receives all the bytes for the new FRU, it calls
>>>>> /xyz/openbmc_project/FruDevice WriteFru via D-Bus with the full FRU
>>>>> blob (i.e. the full cache with modifications on top). After that the
>>>>> cache is cleared.
>>>>>
>>>>> - The D-Bus call to WriteFru is handled by fru_device.cpp writeFRU. It
>>>>> first performs a sanity check and then writes the blob to the
>>>>> underlying device (or the /etc/fru/baseboard.fru.bin for FRU 0). It
>>>>> subsequently calls rescanBusses() which actually executes after about
>>>>> 1 second.
>>>>>
>>>>> - Meanwhile, "ipmitool fru write" continues to happily send additional
>>>>> bytes to ipmiStorageWriteFruData (as in step #3 above). Since its
>>>>> cache has just been cleared, it retrieves the current FRU from
>>>>> fru-device again. However, since that has not yet completed
>>>>> "rescanBusses", it actually received the original FRU again, not the
>>>>> modified version. The above cycle repeats, but this time the original
>>>>> FRU with additional modifications from the additional bytes.
>>>>>
>>>>> In the best case (if the new FRU data is larger than the original FRU
>>>>> data), the result is that the FRU did not get updated at all, since we
>>>>> effectively just wrote back the original FRU. However, if the new FRU
>>>>> data is smaller than the original FRU data, this may result in
>>>>> corrupted FRU data to be persisted.
>>>>>
>>>>>
>>>>> I was trying to figure out how to best fix that, but couldn't come up
>>>>> with a design that would still work. Some ideas:
>>>>>
>>>>> A)  I think what we're missing is feedback from fru-device to ipmid
>>>>> that the FRU write operation has actually completed and the FRU data
>>>>> was re-read. The ipmid should probably not accept any additional write
>>>>> requests until the previous write request has completed and the new
>>>>> FRU data is available.
>>>>>
>>>>> B) If ipmiStorageWriteFruData cannot detect the expected size of the
>>>>> FRU data, it relies on a timeout to eventually write the blob if no
>>>>> more data was received from IPMI. I wonder if this mechanism is "too
>>>>> clever" and we should simply always just wait for the timeout?
>>>>>
>>>>> C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
>>>>> was called. Maybe we should keep that cache around until the timeout
>>>>> expires?
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Thanks
>>>>> Oskar.
>>>>>
>>>>

[-- Attachment #2: Type: text/html, Size: 12899 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ipmitool fru write 0 - does not update "baseboard" FRU
  2024-05-09 16:17         ` Willy Tu
@ 2024-05-20 13:03           ` Oskar Senft
  0 siblings, 0 replies; 7+ messages in thread
From: Oskar Senft @ 2024-05-20 13:03 UTC (permalink / raw)
  To: Willy Tu; +Cc: OpenBMC Maillist, Ali El-Haj-Mahmoud

[-- Attachment #1: Type: text/plain, Size: 11790 bytes --]

Thanks for the input. It took a little while (other commitments ...), but I
now have a version that solves the problem and shouldn't break existing
behavior: https://gerrit.openbmc.org/c/openbmc/phosphor-host-ipmid/+/71530

I added you as a reviewer, Willy.

Oskar

On Thu, May 9, 2024 at 6:17 PM Willy Tu <wltu@google.com> wrote:

> I am not aware of a way to differentiate between the clients. Might just
> have to error out as you suggested.
>
> Willy Tu
>
> On Tue, May 7, 2024 at 5:05 PM Oskar Senft <osk@google.com> wrote:
>
>> I got some code changes that look right, but haven't done much testing
>> yet. I realized that a read cache is useful, so I kept it. I split read and
>> write cache by defining a new class that keeps all the related data
>> together.
>>
>> Any suggestions on how to distinguish different clients? Or maybe we just
>> error out when we receive a request for a different FRU while we're still
>> not done with the first one?
>>
>> Oskar.
>>
>> On Tue, May 7, 2024 at 7:49 PM Willy Tu <wltu@google.com> wrote:
>>
>>> I think we should make it so that if a different IPMI client tries to
>>> write the Fru we prevent it and only allow one write at a time. I think the
>>> fruCache for Read is mostly from some commands that is dealing with one
>>> device ID multiple times. When reading subsequent ids... then it doesn't do
>>> anything.
>>>
>>> I think we will need more discussion on that topic since it will be a
>>> larger refactor to make that work.
>>>
>>> Sounds good. Please let me know how it goes.
>>>
>>> Willy Tu
>>>
>>> On Tue, May 7, 2024 at 10:32 AM Oskar Senft <osk@google.com> wrote:
>>>
>>>> Hi Willy
>>>>
>>>> Thanks for your input!
>>>>
>>>> From what I can tell, the current implementation will fail in wondrous
>>>> ways if there's more than one IPMI client trying to write FRU at the same
>>>> time. The existing getFru guards against changing target devId between
>>>> calls to not hand-out the same cache for different requests. However, this
>>>> will clearly break when different IPMI clients attempt to write the same or
>>>> different FRUs at the same time.
>>>>
>>>> We could argue whether that's a supported use case or if we just assume
>>>> that'll never happen ... it does seem like quite a bit of an edge case,
>>>> though.
>>>>
>>>> I do see it as an issue if there were multiple clients with only one
>>>> writing, but others reading - that'll fail in similarly weird ways.
>>>>
>>>> I'm wondering: Do we want to have the fruCache for _reads_ at all? It
>>>> seems actually quite wrong, since subsequent reads for the same FRU would
>>>> always return the same result, even if the FRU changed through some other
>>>> mechanism.
>>>>
>>>> Let me work on a fix that would use the cache only for writing and
>>>> would keep it around until the timeout expired.
>>>>
>>>> Oskar.
>>>>
>>>> On Fri, May 3, 2024 at 11:38 AM Willy Tu <wltu@google.com> wrote:
>>>>
>>>>> Hi Oskar,
>>>>>
>>>>> > C) ipmiStorageWriteFruData clears the cache immediately after
>>>>> WriteFru was called. Maybe we should keep that cache around until the
>>>>> timeout expires?
>>>>>
>>>>> It seems like this is an issue of multiple clients taking to ipmid. In
>>>>> the middle of writing... There is another client that is reading or
>>>>> something
>>>>> else and will reset the fruCache as you mentioned. In that case, I
>>>>> think it may be best to refactor it out to use another getFru method...
>>>>> maybe like getFruToWrite
>>>>>
>>>>> Maybe just something like this
>>>>>
>>>>> ```
>>>>> std::vector<uint8_t> getFruToWrite(...){
>>>>>   if (writeTimer->isRunning()) {
>>>>>     return fruCacheForWrite;
>>>>>   }
>>>>>   auto [_, fruCacheForWrite] = getFru(...);
>>>>>   return fruCacheForWrite;
>>>>> }
>>>>> ```
>>>>>
>>>>> Also need to change `writeFruCache` and such to use `fruCacheForWrite`
>>>>> and such.
>>>>>
>>>>> Please let me know if that works for you. Feel free to reach out
>>>>> internally for faster feedback.
>>>>>
>>>>> Willy Tu
>>>>>
>>>>> On Thu, May 2, 2024 at 11:32 AM Oskar Senft <osk@google.com> wrote:
>>>>>
>>>>>> Hi everyone
>>>>>>
>>>>>> tl;dr - Attempting "ipmitool fru write" with an input file that
>>>>>> contains additional bytes beyond the actual FRU data does not actually
>>>>>> update the FRU on OpenBMC at head w/ entity-manager.
>>>>>>
>>>>>> Details:
>>>>>>
>>>>>> I ran into an issue with updating the "baseboard" FRU (0), which is
>>>>>> stored as /etc/fru/baseboard.fru.bin. However, I don't think this is
>>>>>> specific to FRU 0 and could apply to other updateable FRUs in the same
>>>>>> way.
>>>>>>
>>>>>>
>>>>>> 1. Start off with a "minimal" /etc/fru/baseboard.fru.bin which just
>>>>>> contains chassis type (that's required for entity-manager's fru-device
>>>>>> to recognize the file.
>>>>>>
>>>>>> root@akita:~# ls -al /etc/fru/baseboard.fru.bin
>>>>>> -rw-r--r--    1 root     root           512 Sep 21 05:21
>>>>>> /etc/fru/baseboard.fru.bin
>>>>>>
>>>>>> root@akita:~# hexdump -C /etc/fru/baseboard.fru.bin
>>>>>> 00000000  01 00 01 00 00 00 00 fe  01 01 17 c0 c0 c1 00 a6
>>>>>> |................|
>>>>>> 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>>> |................|
>>>>>> *
>>>>>> 00000200
>>>>>>
>>>>>> root@akita:~# ipmitool fru print 0
>>>>>>  Chassis Type          : Rack Mount Chassis
>>>>>>  Chassis Area Checksum : OK
>>>>>>
>>>>>>
>>>>>> 2. Prepare a FRU file with additional data, e.g. with frugy:
>>>>>>
>>>>>> (frugy) osk@osk:~$ cat demo.yml
>>>>>> ChassisInfo:
>>>>>>   type: 23
>>>>>>   part_number: '4711'
>>>>>>   serial_number: '12345'
>>>>>>
>>>>>> (frugy) osk@osk:~$ frugy demo.yml -o demo.bin
>>>>>>
>>>>>> (frugy) osk@osk:~$ ls -al demo.bin
>>>>>> -rw-r----- 1 osk primarygroup 24 May  2 13:51 demo.bin
>>>>>>
>>>>>> (frugy) osk@osk:~$ hexdump -C demo.bin
>>>>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>>>>> |............4711|
>>>>>> 00000010  c5 31 32 33 34 35 c1 d0                           |.12345..|
>>>>>> 00000018
>>>>>>
>>>>>> Note that frugy generates a minimal binary by default. However, other
>>>>>> processes may not and instead produce a fixed size binary blob. This
>>>>>> happens, e.g. when performing "ipmitool fru read" which returns the
>>>>>> whole contents of the underlying storage. A subsequent "ipmitool fru
>>>>>> write" with that blob will fail as explained here.
>>>>>>
>>>>>> To simulate this here, increase the file to 256 bytes:
>>>>>>
>>>>>> (frugy) osk@osk:~$ cp demo.bin demo-256.bin
>>>>>> (frugy) osk@osk:~$ dd if=/dev/zero bs=1 seek=256 count=0
>>>>>> of=demo-256.bin
>>>>>> 0+0 records in
>>>>>> 0+0 records out
>>>>>> 0 bytes copied, 5.1138e-05 s, 0.0 kB/s
>>>>>>
>>>>>> (frugy) osk@osk:~$ ls -al demo-256.bin
>>>>>> -rw-r----- 1 osk primarygroup 256 May  2 13:57 demo-256.bin
>>>>>>
>>>>>> (frugy) osk@osk:~$ hexdump -C demo-256.bin
>>>>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>>>>> |............4711|
>>>>>> 00000010  c5 31 32 33 34 35 c1 d0  00 00 00 00 00 00 00 00
>>>>>> |.12345..........|
>>>>>> 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>>> |................|
>>>>>> *
>>>>>> 00000100
>>>>>>
>>>>>>
>>>>>> 3. Write the newly generated FRU:
>>>>>>
>>>>>> root@akita:~# ipmitool fru write 0 demo-256.bin
>>>>>> Fru Size         : 512 bytes
>>>>>> Size to Write    : 256 bytes
>>>>>>
>>>>>> Wait ~10 seconds for the fru-device process to reload the contents.
>>>>>>
>>>>>>
>>>>>> 4. Re-read the FRU contents:
>>>>>>
>>>>>> root@akita:~# ipmitool fru print 0
>>>>>>  Chassis Type          : Rack Mount Chassis
>>>>>>  Chassis Area Checksum : OK
>>>>>>
>>>>>>
>>>>>> 5. For comparison, write only the minimal FRU:
>>>>>>
>>>>>> root@akita:~# ipmitool fru write 0 demo.bin
>>>>>> Fru Size         : 512 bytes
>>>>>> Size to Write    : 24 bytes
>>>>>>
>>>>>> Wait ~10 seconds.
>>>>>>
>>>>>> root@akita:~# ipmitool fru print 0
>>>>>>  Chassis Type          : Rack Mount Chassis
>>>>>>  Chassis Part Number   : 4711
>>>>>>  Chassis Serial        : 12345
>>>>>>  Chassis Area Checksum : OK
>>>>>>
>>>>>>
>>>>>> I dug into this and found that this is caused by an interaction
>>>>>> between
>>>>>> https://github.com/openbmc/phosphor-host-ipmid/blob/master/dbus-sdr/storagecommands.cpp
>>>>>> and
>>>>>> https://github.com/openbmc/entity-manager/blob/master/src/fru_device.cpp
>>>>>> .
>>>>>>
>>>>>> Here's what happens:
>>>>>> - The "ipmitool fru write" request is handled by storagecommands.cpp
>>>>>> ipmiStorageWriteFruData.
>>>>>>
>>>>>> - ipmiStorageWriteFruData receives the whole FRU blob from IPMI across
>>>>>> several calls. On the initial call, it requests the current FRU blob
>>>>>> via D-bus from fru-device and caches it in the process. It then
>>>>>> overwrites sections of that cache with the received data from IPMI.
>>>>>>
>>>>>> - ipmiStorageWriteFruData uses information from the FRU header to
>>>>>> check whether it received all the bytes that make up the new FRU. Note
>>>>>> that this could be fewer bytes than the size of the cached blob. Once
>>>>>> it receives all the bytes for the new FRU, it calls
>>>>>> /xyz/openbmc_project/FruDevice WriteFru via D-Bus with the full FRU
>>>>>> blob (i.e. the full cache with modifications on top). After that the
>>>>>> cache is cleared.
>>>>>>
>>>>>> - The D-Bus call to WriteFru is handled by fru_device.cpp writeFRU. It
>>>>>> first performs a sanity check and then writes the blob to the
>>>>>> underlying device (or the /etc/fru/baseboard.fru.bin for FRU 0). It
>>>>>> subsequently calls rescanBusses() which actually executes after about
>>>>>> 1 second.
>>>>>>
>>>>>> - Meanwhile, "ipmitool fru write" continues to happily send additional
>>>>>> bytes to ipmiStorageWriteFruData (as in step #3 above). Since its
>>>>>> cache has just been cleared, it retrieves the current FRU from
>>>>>> fru-device again. However, since that has not yet completed
>>>>>> "rescanBusses", it actually received the original FRU again, not the
>>>>>> modified version. The above cycle repeats, but this time the original
>>>>>> FRU with additional modifications from the additional bytes.
>>>>>>
>>>>>> In the best case (if the new FRU data is larger than the original FRU
>>>>>> data), the result is that the FRU did not get updated at all, since we
>>>>>> effectively just wrote back the original FRU. However, if the new FRU
>>>>>> data is smaller than the original FRU data, this may result in
>>>>>> corrupted FRU data to be persisted.
>>>>>>
>>>>>>
>>>>>> I was trying to figure out how to best fix that, but couldn't come up
>>>>>> with a design that would still work. Some ideas:
>>>>>>
>>>>>> A)  I think what we're missing is feedback from fru-device to ipmid
>>>>>> that the FRU write operation has actually completed and the FRU data
>>>>>> was re-read. The ipmid should probably not accept any additional write
>>>>>> requests until the previous write request has completed and the new
>>>>>> FRU data is available.
>>>>>>
>>>>>> B) If ipmiStorageWriteFruData cannot detect the expected size of the
>>>>>> FRU data, it relies on a timeout to eventually write the blob if no
>>>>>> more data was received from IPMI. I wonder if this mechanism is "too
>>>>>> clever" and we should simply always just wait for the timeout?
>>>>>>
>>>>>> C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
>>>>>> was called. Maybe we should keep that cache around until the timeout
>>>>>> expires?
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Thanks
>>>>>> Oskar.
>>>>>>
>>>>>

[-- Attachment #2: Type: text/html, Size: 13700 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-05-20 13:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-02 18:31 ipmitool fru write 0 - does not update "baseboard" FRU Oskar Senft
2024-05-03 15:38 ` Willy Tu
2024-05-07 17:32   ` Oskar Senft
2024-05-07 23:49     ` Willy Tu
2024-05-08  0:04       ` Oskar Senft
2024-05-09 16:17         ` Willy Tu
2024-05-20 13:03           ` Oskar Senft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).