ipmitool fru write 0

* ipmitool fru write 0 - does not update "baseboard" FRU
@ 2024-05-02 18:31 Oskar Senft
  2024-05-03 15:38 ` Willy Tu
  0 siblings, 1 reply; 7+ messages in thread
From: Oskar Senft @ 2024-05-02 18:31 UTC (permalink / raw)
  To: OpenBMC Maillist; +Cc: Ed Tanous, Ali El-Haj-Mahmoud

Hi everyone

tl;dr - Attempting "ipmitool fru write" with an input file that
contains additional bytes beyond the actual FRU data does not actually
update the FRU on OpenBMC at head w/ entity-manager.

Details:

I ran into an issue with updating the "baseboard" FRU (0), which is
stored as /etc/fru/baseboard.fru.bin. However, I don't think this is
specific to FRU 0 and could apply to other updateable FRUs in the same
way.

1. Start off with a "minimal" /etc/fru/baseboard.fru.bin which just
contains chassis type (that's required for entity-manager's fru-device
to recognize the file.

root@akita:~# ls -al /etc/fru/baseboard.fru.bin
-rw-r--r--    1 root     root           512 Sep 21 05:21
/etc/fru/baseboard.fru.bin

root@akita:~# hexdump -C /etc/fru/baseboard.fru.bin
00000000  01 00 01 00 00 00 00 fe  01 01 17 c0 c0 c1 00 a6  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200

root@akita:~# ipmitool fru print 0
 Chassis Type          : Rack Mount Chassis
 Chassis Area Checksum : OK

2. Prepare a FRU file with additional data, e.g. with frugy:

(frugy) osk@osk:~$ cat demo.yml
ChassisInfo:
  type: 23
  part_number: '4711'
  serial_number: '12345'

(frugy) osk@osk:~$ frugy demo.yml -o demo.bin

(frugy) osk@osk:~$ ls -al demo.bin
-rw-r----- 1 osk primarygroup 24 May  2 13:51 demo.bin

(frugy) osk@osk:~$ hexdump -C demo.bin
00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31  |............4711|
00000010  c5 31 32 33 34 35 c1 d0                           |.12345..|
00000018

Note that frugy generates a minimal binary by default. However, other
processes may not and instead produce a fixed size binary blob. This
happens, e.g. when performing "ipmitool fru read" which returns the
whole contents of the underlying storage. A subsequent "ipmitool fru
write" with that blob will fail as explained here.

To simulate this here, increase the file to 256 bytes:

(frugy) osk@osk:~$ cp demo.bin demo-256.bin
(frugy) osk@osk:~$ dd if=/dev/zero bs=1 seek=256 count=0 of=demo-256.bin
0+0 records in
0+0 records out
0 bytes copied, 5.1138e-05 s, 0.0 kB/s

(frugy) osk@osk:~$ ls -al demo-256.bin
-rw-r----- 1 osk primarygroup 256 May  2 13:57 demo-256.bin

(frugy) osk@osk:~$ hexdump -C demo-256.bin
00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31  |............4711|
00000010  c5 31 32 33 34 35 c1 d0  00 00 00 00 00 00 00 00  |.12345..........|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100

3. Write the newly generated FRU:

root@akita:~# ipmitool fru write 0 demo-256.bin
Fru Size         : 512 bytes
Size to Write    : 256 bytes

Wait ~10 seconds for the fru-device process to reload the contents.

4. Re-read the FRU contents:

root@akita:~# ipmitool fru print 0
 Chassis Type          : Rack Mount Chassis
 Chassis Area Checksum : OK

5. For comparison, write only the minimal FRU:

root@akita:~# ipmitool fru write 0 demo.bin
Fru Size         : 512 bytes
Size to Write    : 24 bytes

Wait ~10 seconds.

root@akita:~# ipmitool fru print 0
 Chassis Type          : Rack Mount Chassis
 Chassis Part Number   : 4711
 Chassis Serial        : 12345
 Chassis Area Checksum : OK

I dug into this and found that this is caused by an interaction
between https://github.com/openbmc/phosphor-host-ipmid/blob/master/dbus-sdr/storagecommands.cpp
and https://github.com/openbmc/entity-manager/blob/master/src/fru_device.cpp.

Here's what happens:
- The "ipmitool fru write" request is handled by storagecommands.cpp
ipmiStorageWriteFruData.

- ipmiStorageWriteFruData receives the whole FRU blob from IPMI across
several calls. On the initial call, it requests the current FRU blob
via D-bus from fru-device and caches it in the process. It then
overwrites sections of that cache with the received data from IPMI.

- ipmiStorageWriteFruData uses information from the FRU header to
check whether it received all the bytes that make up the new FRU. Note
that this could be fewer bytes than the size of the cached blob. Once
it receives all the bytes for the new FRU, it calls
/xyz/openbmc_project/FruDevice WriteFru via D-Bus with the full FRU
blob (i.e. the full cache with modifications on top). After that the
cache is cleared.

- The D-Bus call to WriteFru is handled by fru_device.cpp writeFRU. It
first performs a sanity check and then writes the blob to the
underlying device (or the /etc/fru/baseboard.fru.bin for FRU 0). It
subsequently calls rescanBusses() which actually executes after about
1 second.

- Meanwhile, "ipmitool fru write" continues to happily send additional
bytes to ipmiStorageWriteFruData (as in step #3 above). Since its
cache has just been cleared, it retrieves the current FRU from
fru-device again. However, since that has not yet completed
"rescanBusses", it actually received the original FRU again, not the
modified version. The above cycle repeats, but this time the original
FRU with additional modifications from the additional bytes.

In the best case (if the new FRU data is larger than the original FRU
data), the result is that the FRU did not get updated at all, since we
effectively just wrote back the original FRU. However, if the new FRU
data is smaller than the original FRU data, this may result in
corrupted FRU data to be persisted.

I was trying to figure out how to best fix that, but couldn't come up
with a design that would still work. Some ideas:

A)  I think what we're missing is feedback from fru-device to ipmid
that the FRU write operation has actually completed and the FRU data
was re-read. The ipmid should probably not accept any additional write
requests until the previous write request has completed and the new
FRU data is available.

B) If ipmiStorageWriteFruData cannot detect the expected size of the
FRU data, it relies on a timeout to eventually write the blob if no
more data was received from IPMI. I wonder if this mechanism is "too
clever" and we should simply always just wait for the timeout?

C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
was called. Maybe we should keep that cache around until the timeout
expires?

Thoughts?

Thanks
Oskar.

^ permalink raw reply	[flat|nested] 7+ messages in thread