Re: [Qemu-devel] [PATCH v3 4/5] fw_cfg: add generic non-DMA read method

From: Laszlo Ersek <lersek@redhat.com>
To: "Gabriel L. Somlo" <somlo@cmu.edu>, qemu-devel@nongnu.org
Cc: peter.maydell@linaro.org, jordan.l.justen@intel.com,
	markmb@redhat.com, kraxel@redhat.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCH v3 4/5] fw_cfg: add generic non-DMA read method
Date: Tue, 3 Nov 2015 11:53:55 +0100	[thread overview]
Message-ID: <56389243.4040106@redhat.com> (raw)
In-Reply-To: <1446510945-18477-5-git-send-email-somlo@cmu.edu>

Thank you for splitting out this patch; it makes it easier to review.
However,

On 11/03/15 01:35, Gabriel L. Somlo wrote:
> Introduce fw_cfg_data_read(), a generic read method which works
> on all access widths (1 through 8 bytes, inclusive), and can be
> used during both IOPort and MMIO read accesses.
> 
> To maintain legibility, only fw_cfg_data_mem_read() (the MMIO
> data read method) is replaced by this patch. The new method
> essentially unwinds the fw_cfg_data_mem_read() + fw_cfg_read()
> combo, but without unnecessarily repeating all the validity
> checks performed by the latter on each byte being read.

this unwinding caused a bug to creep in.

Namely, we have to identify the set of data that remains constant
between *all* "size" calls that fw_cfg_data_mem_read() makes to
fw_cfg_read(), and hoist / eliminate the checks on those *only*.

Specifically,

> This patch also modifies the trace_fw_cfg_read prototype to
> accept a 64-bit value argument, allowing it to work properly
> with the new read method, but also remain backward compatible
> with existing call sites.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Marc Marí <markmb@redhat.com>
> Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
> ---
>  hw/nvram/fw_cfg.c | 33 +++++++++++++++++++--------------
>  trace-events      |  2 +-
>  2 files changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> index c2d3a0a..8aa980c 100644
> --- a/hw/nvram/fw_cfg.c
> +++ b/hw/nvram/fw_cfg.c
> @@ -274,6 +274,24 @@ static int fw_cfg_select(FWCfgState *s, uint16_t key)
>      return ret;
>  }
>  
> +static uint64_t fw_cfg_data_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    FWCfgState *s = opaque;

This is good.

> +    int arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL);

Okay too.

> +    FWCfgEntry *e = &s->entries[arch][s->cur_entry & FW_CFG_ENTRY_MASK];

(1) Side point: the conversion here is faithful to the original code in
fw_cfg_read(), but even in the original code, the expression uses
"s->cur_entry" as a (masked) subscript *before* comparing it against
FW_CFG_INVALID. I don't think that's right.

The same issue is present in fw_cfg_dma_transfer(). Care to write a
patch (before the restructuring) that fixes both?

Note, I am aware that the expression in both of the above mentioned
functions only calculates the *address* of the nonexistent element
belonging to (FW_CFG_INVALID & FW_CFG_ENTRY_MASK) == 0x3FFF:

  e = &s->entries[arch][s->cur_entry & FW_CFG_ENTRY_MASK];

But it doesn't matter; it's undefined behavior just the same. Instead,
*both* locations should say:

 e = (s->cur_entry == FW_CFG_INVALID) ? NULL :
     &s->entries[arch][s->cur_entry & FW_CFG_ENTRY_MASK];

(I share the blame for not noticing this earlier -- I too reviewed
fw_cfg_dma_transfer().)

NULL is a valid pointer to *evaluate* (not to dereference), whereas the
current address-of expression is not valid even for evaluation. Also, in
practice, dereferencing NULL would give us a nice (as in, non-garbage)
SIGSEGV.

Anyway, back to the topic at hand:

> +    uint64_t value = 0;
> +
> +    assert(size <= sizeof(value));
> +    if (s->cur_entry != FW_CFG_INVALID && e->data) {

Right, good conversion. (Side note: this does protect against
*dereferencing* "e", but it's already too late, as far as undefined
behavior is concerned.)

> +        while (size-- && s->cur_offset < e->len) {
> +            value = (value << 8) | e->data[s->cur_offset++];
> +        }

(2) So, this is the bug. The pre-conversion code would keep shifting
"value" to the left until "size" was reached, regardless of the
underlying blob size, and just leave the least significant bytes zeroed
if the item ended too early. Whereas this loop *stops shifting* when the
blob ends.

Since the wide data register (which is big-endian) implements a
substring-preserving transfer (on top of QEMU's integer preserving
device r/w infrastructure), this change breaks the case when the
firmware reads, say, 8 bytes from the register in a single access, when
only 3 are left in the blob, and then uses only the three *lowest
address* bytes from the uint64_t value read. Although no known firmware
does this at the moment, it would be valid, and the above hunk would
break it.

Hence please

(2a) either append the missing "cumulative" shift after the loop:

    while (size && s->cur_offset < e->len) {
        --size;
        value = (value << 8) | e->data[s->cur_offset++];
    }
    value <<= 8 * size;

(2b) or move the offset check from the loop's controlling expression
into the value composition:

        while (size--) {
            value = (value << 8) | (s->cur_offset < e->len ?
                                    e->data[s->cur_offset++] :
                                    0);
        }

The rest looks good.

Thanks
Laszlo

> +    }
> +
> +    trace_fw_cfg_read(s, value);
> +    return value;
> +}
> +
>  static uint8_t fw_cfg_read(FWCfgState *s)
>  {
>      int arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL);
> @@ -290,19 +308,6 @@ static uint8_t fw_cfg_read(FWCfgState *s)
>      return ret;
>  }
>  
> -static uint64_t fw_cfg_data_mem_read(void *opaque, hwaddr addr,
> -                                     unsigned size)
> -{
> -    FWCfgState *s = opaque;
> -    uint64_t value = 0;
> -    unsigned i;
> -
> -    for (i = 0; i < size; ++i) {
> -        value = (value << 8) | fw_cfg_read(s);
> -    }
> -    return value;
> -}
> -
>  static void fw_cfg_data_mem_write(void *opaque, hwaddr addr,
>                                    uint64_t value, unsigned size)
>  {
> @@ -483,7 +488,7 @@ static const MemoryRegionOps fw_cfg_ctl_mem_ops = {
>  };
>  
>  static const MemoryRegionOps fw_cfg_data_mem_ops = {
> -    .read = fw_cfg_data_mem_read,
> +    .read = fw_cfg_data_read,
>      .write = fw_cfg_data_mem_write,
>      .endianness = DEVICE_BIG_ENDIAN,
>      .valid = {
> diff --git a/trace-events b/trace-events
> index 72136b9..5073040 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -196,7 +196,7 @@ ecc_diag_mem_readb(uint64_t addr, uint32_t ret) "Read diagnostic %"PRId64"= %02x
>  
>  # hw/nvram/fw_cfg.c
>  fw_cfg_select(void *s, uint16_t key, int ret) "%p key %d = %d"
> -fw_cfg_read(void *s, uint8_t ret) "%p = %d"
> +fw_cfg_read(void *s, uint64_t ret) "%p = %"PRIx64
>  fw_cfg_add_file(void *s, int index, char *name, size_t len) "%p #%d: %s (%zd bytes)"
>  
>  # hw/block/hd-geometry.c
>