All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric DeVolder <eric.devolder@oracle.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: ehabkost@redhat.com, mst@redhat.com, konrad.wilk@oracle.com,
	qemu-devel@nongnu.org, pbonzini@redhat.com,
	boris.ostrovsky@oracle.com, rth@twiddle.net
Subject: Re: [PATCH v5 05/10] ACPI ERST: support for ACPI ERST feature
Date: Wed, 21 Jul 2021 11:07:40 -0500	[thread overview]
Message-ID: <0a09250e-4347-0ad6-d95c-0bfd2094b98b@oracle.com> (raw)
In-Reply-To: <20210720141704.381734cc@redhat.com>



On 7/20/21 7:17 AM, Igor Mammedov wrote:
> On Wed, 30 Jun 2021 15:07:16 -0400
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
>> This change implements the support for the ACPI ERST feature.
> Drop this
Done

> 
>>
>> This implements a PCI device for ACPI ERST. This implments the
> s/implments/implements/
Corrected

> 
>> non-NVRAM "mode" of operation for ERST.
> add here why non-NVRAM "mode" is implemented.
How about:
This implements a PCI device for ACPI ERST. This implments the
non-NVRAM "mode" of operation for ERST as it is supported by
Linux and Windows and aligns with ERST support in most BIOS.


> 
> Also even if this non-NVRAM implementation, there is still
> a lot of not necessary data copying (see below) so drop it
> or justify why it's there.
>   
>> This change also includes erst.c in the build of general ACPI support.
> Drop this as well
Done

> 
> 
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> ---
>>   hw/acpi/erst.c      | 704 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   hw/acpi/meson.build |   1 +
>>   2 files changed, 705 insertions(+)
>>   create mode 100644 hw/acpi/erst.c
>>
>> diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
>> new file mode 100644
>> index 0000000..6e9bd2e
>> --- /dev/null
>> +++ b/hw/acpi/erst.c
>> @@ -0,0 +1,704 @@
>> +/*
>> + * ACPI Error Record Serialization Table, ERST, Implementation
>> + *
>> + * Copyright (c) 2021 Oracle and/or its affiliates.
>> + *
>> + * ACPI ERST introduced in ACPI 4.0, June 16, 2009.
>> + * ACPI Platform Error Interfaces : Error Serialization
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation;
>> + * version 2 of the License.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
>> + */
>> +
>> +#include <sys/types.h>
>> +#include <sys/stat.h>
>> +#include <unistd.h>
>> +
>> +#include "qemu/osdep.h"
>> +#include "qapi/error.h"
>> +#include "hw/qdev-core.h"
>> +#include "exec/memory.h"
>> +#include "qom/object.h"
>> +#include "hw/pci/pci.h"
>> +#include "qom/object_interfaces.h"
>> +#include "qemu/error-report.h"
>> +#include "migration/vmstate.h"
>> +#include "hw/qdev-properties.h"
>> +#include "hw/acpi/acpi.h"
>> +#include "hw/acpi/acpi-defs.h"
>> +#include "hw/acpi/aml-build.h"
>> +#include "hw/acpi/bios-linker-loader.h"
>> +#include "exec/address-spaces.h"
>> +#include "sysemu/hostmem.h"
>> +#include "hw/acpi/erst.h"
>> +#include "trace.h"
>> +
>> +/* UEFI 2.1: Append N Common Platform Error Record */
>> +#define UEFI_CPER_RECORD_MIN_SIZE 128U
>> +#define UEFI_CPER_RECORD_LENGTH_OFFSET 20U
>> +#define UEFI_CPER_RECORD_ID_OFFSET 96U
>> +#define IS_UEFI_CPER_RECORD(ptr) \
>> +    (((ptr)[0] == 'C') && \
>> +     ((ptr)[1] == 'P') && \
>> +     ((ptr)[2] == 'E') && \
>> +     ((ptr)[3] == 'R'))
>> +#define THE_UEFI_CPER_RECORD_ID(ptr) \
>> +    (*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
>> +
>> +/*
>> + * This implementation is an ACTION (cmd) and VALUE (data)
>> + * interface consisting of just two 64-bit registers.
>> + */
>> +#define ERST_REG_SIZE (2UL * sizeof(uint64_t))
> 
>> +#define ERST_CSR_ACTION (0UL << 3) /* action (cmd) */
>> +#define ERST_CSR_VALUE  (1UL << 3) /* argument/value (data) */
> what's meaning of CRS?
CSR = control status register
> Looking at patch both should be called ERST_[ACTION|VALUE]_OFFSET
Done
> pls use explicit offset values instead of shifting bit.
Done
> 
> 
>> +/*
>> + * ERST_RECORD_SIZE is the buffer size for exchanging ERST
>> + * record contents. Thus, it defines the maximum record size.
>> + * As this is mapped through a PCI BAR, it must be a power of
>> + * two, and should be at least PAGE_SIZE.
>> + * Records are stored in the backing file in a simple fashion.
>> + * The backing file is essentially divided into fixed size
>> + * "slots", ERST_RECORD_SIZE in length, with each "slot"
>> + * storing a single record. No attempt at optimizing storage
>> + * through compression, compaction, etc is attempted.
>> + * NOTE that any change to this value will make any pre-
>> + * existing backing files, not of the same ERST_RECORD_SIZE,
>> + * unusable to the guest.
>> + */
>> +/* 8KiB records, not too small, not too big */
>> +#define ERST_RECORD_SIZE (2UL * 4096)
>> +
>> +#define ERST_INVALID_RECORD_ID (~0UL)
>> +#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
>> +
>> +/*
>> + * Object cast macro
>> + */
>> +#define ACPIERST(obj) \
>> +    OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
>> +
>> +/*
>> + * Main ERST device state structure
>> + */
>> +typedef struct {
>> +    PCIDevice parent_obj;
>> +
>> +    HostMemoryBackend *hostmem;
>> +    MemoryRegion *hostmem_mr;
>> +
>> +    MemoryRegion iomem; /* programming registes */
>> +    MemoryRegion nvmem; /* exchange buffer */
>> +    uint32_t prop_size;
> s/^^^/storage_size/
Corrected

> 
>> +    hwaddr bar0; /* programming registers */
>> +    hwaddr bar1; /* exchange buffer */
> why do you need to keep this addresses around?
> Suggest to drop these fields and use local variables or pci_get_bar_addr() at call site.
Corrected

> 
>> +
>> +    uint8_t operation;
>> +    uint8_t busy_status;
>> +    uint8_t command_status;
>> +    uint32_t record_offset;
>> +    uint32_t record_count;
>> +    uint64_t reg_action;
>> +    uint64_t reg_value;
>> +    uint64_t record_identifier;
>> +
>> +    unsigned next_record_index;
> 
> 
>> +    uint8_t record[ERST_RECORD_SIZE]; /* read/written directly by guest */
>> +    uint8_t tmp_record[ERST_RECORD_SIZE]; /* intermediate manipulation buffer */
> drop these see [**] below
Corrected

> 
>> +
>> +} ERSTDeviceState;
>> +
>> +/*******************************************************************/
>> +/*******************************************************************/
>> +
>> +static unsigned copy_from_nvram_by_index(ERSTDeviceState *s, unsigned index)
>> +{
>> +    /* Read an nvram entry into tmp_record */
>> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
>> +    off_t offset = (index * ERST_RECORD_SIZE);
>> +
>> +    if ((offset + ERST_RECORD_SIZE) <= s->prop_size) {
>> +        if (s->hostmem_mr) {
>> +            uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(s->hostmem_mr);
>> +            memcpy(s->tmp_record, p + offset, ERST_RECORD_SIZE);
>> +            rc = ACPI_ERST_STATUS_SUCCESS;
>> +        }
>> +    }
>> +    return rc;
>> +}
>> +
>> +static unsigned copy_to_nvram_by_index(ERSTDeviceState *s, unsigned index)
>> +{
>> +    /* Write entry in tmp_record into nvram, and backing file */
>> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
>> +    off_t offset = (index * ERST_RECORD_SIZE);
>> +
>> +    if ((offset + ERST_RECORD_SIZE) <= s->prop_size) {
>> +        if (s->hostmem_mr) {
>> +            uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(s->hostmem_mr);
>> +            memcpy(p + offset, s->tmp_record, ERST_RECORD_SIZE);
>> +            rc = ACPI_ERST_STATUS_SUCCESS;
>> +        }
>> +    }
>> +    return rc;
>> +}
>> +
>> +static int lookup_erst_record_by_identifier(ERSTDeviceState *s,
>> +    uint64_t record_identifier, bool *record_found, bool alloc_for_write)
>> +{
>> +    int rc = -1;
>> +    int empty_index = -1;
>> +    int index = 0;
>> +    unsigned rrc;
>> +
>> +    *record_found = 0;
>> +
>> +    do {
>> +        rrc = copy_from_nvram_by_index(s, (unsigned)index);
> 
> you have direct access to backend memory so there is no need
> whatsoever to copy records from it to an intermediate buffer
> everywhere. Almost all operations with records can be done
> in place modulo EXECUTE_OPERATION action in BEGIN_[READ|WRITE]
> context, where record is moved between backend and guest buffer.
> 
> So please eliminate all not necessary copying.
> (for fun, time operations and set backend size to some huge
> value to see how expensive this code is)

I've corrected this. In our previous exchangs, I thought the reference
to copying was about trying to directly have guest write/read the appropriate
record in the backend storage. After reading this comment I realized that
yes I was doing alot of copying (an artifact of the transition away from
direct file i/o to MemoryBackend). So good find, and I've eliminated the
intermediate copying.

> 
>> +        if (rrc == ACPI_ERST_STATUS_SUCCESS) {
>> +            uint64_t this_identifier;
>> +            this_identifier = THE_UEFI_CPER_RECORD_ID(s->tmp_record);
>> +            if (IS_UEFI_CPER_RECORD(s->tmp_record) &&
>> +                (this_identifier == record_identifier)) {
>> +                rc = index;
>> +                *record_found = 1;
>> +                break;
>> +            }
>> +            if ((this_identifier == ERST_INVALID_RECORD_ID) &&
>> +                (empty_index < 0)) {
>> +                empty_index = index; /* first available for write */
>> +            }
>> +        }
>> +        ++index;
>> +    } while (rrc == ACPI_ERST_STATUS_SUCCESS);
>> +
>> +    /* Record not found, allocate for writing */
>> +    if ((rc < 0) && alloc_for_write) {
>> +        rc = empty_index;
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +static unsigned clear_erst_record(ERSTDeviceState *s)
>> +{
>> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
>> +    bool record_found;
>> +    int index;
>> +
>> +    index = lookup_erst_record_by_identifier(s,
>> +        s->record_identifier, &record_found, 0);
>> +    if (record_found) {
>> +        memset(s->tmp_record, 0xFF, ERST_RECORD_SIZE);
>> +        rc = copy_to_nvram_by_index(s, (unsigned)index);
>> +        if (rc == ACPI_ERST_STATUS_SUCCESS) {
>> +            s->record_count -= 1;
>> +        }
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +static unsigned write_erst_record(ERSTDeviceState *s)
>> +{
>> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
>> +
>> +    if (s->record_offset < (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
>> +        uint64_t record_identifier;
>> +        uint8_t *record = &s->record[s->record_offset];
>> +        bool record_found;
>> +        int index;
>> +
>> +        record_identifier = (s->record_identifier == ERST_INVALID_RECORD_ID)
>> +            ? THE_UEFI_CPER_RECORD_ID(record) : s->record_identifier;
>> +
>> +        index = lookup_erst_record_by_identifier(s,
>> +            record_identifier, &record_found, 1);
>> +        if (index < 0) {
>> +            rc = ACPI_ERST_STATUS_NOT_ENOUGH_SPACE;
>> +        } else {
>> +            if (0 != s->record_offset) {
>> +                memset(&s->tmp_record[ERST_RECORD_SIZE - s->record_offset],
>> +                    0xFF, s->record_offset);
>> +            }
>> +            memcpy(s->tmp_record, record, ERST_RECORD_SIZE - s->record_offset);
>> +            rc = copy_to_nvram_by_index(s, (unsigned)index);
>> +            if (rc == ACPI_ERST_STATUS_SUCCESS) {
>> +                if (!record_found) { /* not overwriting existing record */
>> +                    s->record_count += 1; /* writing new record */
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +static unsigned next_erst_record(ERSTDeviceState *s,
>> +    uint64_t *record_identifier)
>> +{
>> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
>> +    unsigned index;
>> +    unsigned rrc;
>> +
>> +    *record_identifier = ERST_INVALID_RECORD_ID;
>> +
>> +    index = s->next_record_index;
>> +    do {
>> +        rrc = copy_from_nvram_by_index(s, (unsigned)index);
>> +        if (rrc == ACPI_ERST_STATUS_SUCCESS) {
>> +            if (IS_UEFI_CPER_RECORD(s->tmp_record)) {
>> +                s->next_record_index = index + 1; /* where to start next time */
>> +                *record_identifier = THE_UEFI_CPER_RECORD_ID(s->tmp_record);
>> +                rc = ACPI_ERST_STATUS_SUCCESS;
>> +                break;
>> +            }
>> +            ++index;
>> +        } else {
>> +            if (s->next_record_index == 0) {
>> +                rc = ACPI_ERST_STATUS_RECORD_STORE_EMPTY;
>> +            }
>> +            s->next_record_index = 0; /* at end, reset */
>> +        }
>> +    } while (rrc == ACPI_ERST_STATUS_SUCCESS);
>> +
>> +    return rc;
>> +}
>> +
>> +static unsigned read_erst_record(ERSTDeviceState *s)
>> +{
>> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
>> +    bool record_found;
>> +    int index;
>> +
>> +    index = lookup_erst_record_by_identifier(s,
>> +        s->record_identifier, &record_found, 0);
>> +    if (record_found) {
>> +        rc = copy_from_nvram_by_index(s, (unsigned)index);
>> +        if (rc == ACPI_ERST_STATUS_SUCCESS) {
>> +            if (s->record_offset < ERST_RECORD_SIZE) {
>> +                memcpy(&s->record[s->record_offset], s->tmp_record,
>> +                    ERST_RECORD_SIZE - s->record_offset);
>> +            }
>> +        }
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +static unsigned get_erst_record_count(ERSTDeviceState *s)
>> +{
>> +    /* Compute record_count */
>> +    unsigned index = 0;
>> +
>> +    s->record_count = 0;
>> +    while (copy_from_nvram_by_index(s, index) == ACPI_ERST_STATUS_SUCCESS) {
>> +        uint8_t *ptr = &s->tmp_record[0];
>> +        uint64_t record_identifier = THE_UEFI_CPER_RECORD_ID(ptr);
>> +        if (IS_UEFI_CPER_RECORD(ptr) &&
>> +            (ERST_INVALID_RECORD_ID != record_identifier)) {
>> +            s->record_count += 1;
>> +        }
>> +        ++index;
>> +    }
>> +    return s->record_count;
>> +}
>> +
>> +/*******************************************************************/
>> +
>> +static uint64_t erst_rd_reg64(hwaddr addr,
>> +    uint64_t reg, unsigned size)
>> +{
>> +    uint64_t rdval;
>> +    uint64_t mask;
>> +    unsigned shift;
>> +
>> +    if (size == sizeof(uint64_t)) {
>> +        /* 64b access */
>> +        mask = 0xFFFFFFFFFFFFFFFFUL;
>> +        shift = 0;
>> +    } else {
>> +        /* 32b access */
>> +        mask = 0x00000000FFFFFFFFUL;
>> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
>> +    }
>> +
>> +    rdval = reg;
>> +    rdval >>= shift;
>> +    rdval &= mask;
>> +
>> +    return rdval;
>> +}
>> +
>> +static uint64_t erst_wr_reg64(hwaddr addr,
>> +    uint64_t reg, uint64_t val, unsigned size)
>> +{
>> +    uint64_t wrval;
>> +    uint64_t mask;
>> +    unsigned shift;
>> +
>> +    if (size == sizeof(uint64_t)) {
>> +        /* 64b access */
>> +        mask = 0xFFFFFFFFFFFFFFFFUL;
>> +        shift = 0;
>> +    } else {
>> +        /* 32b access */
>> +        mask = 0x00000000FFFFFFFFUL;
>> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
>> +    }
>> +
>> +    val &= mask;
>> +    val <<= shift;
>> +    mask <<= shift;
>> +    wrval = reg;
>> +    wrval &= ~mask;
>> +    wrval |= val;
>> +
>> +    return wrval;
>> +}
> (I see in next patch it's us defining access width in the ACPI tables)
> so question is: do we have to have mixed register width access?
> can't all register accesses be 64-bit?

Initially I attempted to just use 64-bit exclusively. The problem is that,
for reasons I don't understand, the OSPM on Linux, even x86_64, breaks a 64b
register access into two. Here's an example of reading the exchange buffer
address, which is coded as a 64b access:

acpi_erst_reg_write addr: 0x0000 <== 0x000000000000000d (size: 4)
acpi_erst_reg_read  addr: 0x0008 ==> 0x00000000c1010000 (size: 4)
acpi_erst_reg_read  addr: 0x000c ==> 0x0000000000000000 (size: 4)

So I went ahead and made ACTION register accesses 32b, else there would
be two reads of 32-bts, of which the second is useless.

> 
>> +static void erst_reg_write(void *opaque, hwaddr addr,
>> +    uint64_t val, unsigned size)
>> +{
>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
>> +
>> +    /*
>> +     * NOTE: All actions/operations/side effects happen on the WRITE,
>> +     * by design. The READs simply return the reg_value contents.
>> +     */
>> +    trace_acpi_erst_reg_write(addr, val, size);
>> +
>> +    switch (addr) {
>> +    case ERST_CSR_VALUE + 0:
>> +    case ERST_CSR_VALUE + 4:
>> +        s->reg_value = erst_wr_reg64(addr, s->reg_value, val, size);
>> +        break;
>> +    case ERST_CSR_ACTION + 0:
>> +/*  case ERST_CSR_ACTION+4: as coded, not really a 64b register */
>> +        switch (val) {
>> +        case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
>> +        case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
>> +        case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
>> +        case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
>> +        case ACPI_ERST_ACTION_END_OPERATION:
>> +            s->operation = val;
>> +            break;
>> +        case ACPI_ERST_ACTION_SET_RECORD_OFFSET:
>> +            s->record_offset = s->reg_value;
>> +            break;
>> +        case ACPI_ERST_ACTION_EXECUTE_OPERATION:
>> +            if ((uint8_t)s->reg_value == ERST_EXECUTE_OPERATION_MAGIC) {
>> +                s->busy_status = 1;
>> +                switch (s->operation) {
>> +                case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
>> +                    s->command_status = write_erst_record(s);
>> +                    break;
>> +                case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
>> +                    s->command_status = read_erst_record(s);
>> +                    break;
>> +                case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
>> +                    s->command_status = clear_erst_record(s);
>> +                    break;
>> +                case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
>> +                    s->command_status = ACPI_ERST_STATUS_SUCCESS;
>> +                    break;
>> +                case ACPI_ERST_ACTION_END_OPERATION:
>> +                    s->command_status = ACPI_ERST_STATUS_SUCCESS;
>> +                    break;
>> +                default:
>> +                    s->command_status = ACPI_ERST_STATUS_FAILED;
>> +                    break;
>> +                }
>> +                s->record_identifier = ERST_INVALID_RECORD_ID;
>> +                s->busy_status = 0;
>> +            }
>> +            break;
>> +        case ACPI_ERST_ACTION_CHECK_BUSY_STATUS:
>> +            s->reg_value = s->busy_status;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_COMMAND_STATUS:
>> +            s->reg_value = s->command_status;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_RECORD_IDENTIFIER:
>> +            s->command_status = next_erst_record(s, &s->reg_value);
>> +            break;
>> +        case ACPI_ERST_ACTION_SET_RECORD_IDENTIFIER:
>> +            s->record_identifier = s->reg_value;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_RECORD_COUNT:
>> +            s->reg_value = s->record_count;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
>> +            s->reg_value = s->bar1;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
>> +            s->reg_value = ERST_RECORD_SIZE;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
>> +            s->reg_value = 0x0; /* intentional, not NVRAM mode */
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS:
>> +            /*
>> +             * 100UL is max, 10UL is nominal
> 100/10 of what, also add reference to spec/table it comes from
> and explain in comment why theses values were chosen
I've changed the comment and style to be similar to build_amd_iommu().
These are merely sane non-zero max/min times.

> 
>> +             */
>> +            s->reg_value = ((100UL << 32) | (10UL << 0));
>> +            break;
>> +        case ACPI_ERST_ACTION_RESERVED:
> not necessary, it will be handled by 'default:'
Corrected

> 
>> +        default:
>> +            /*
>> +             * Unknown action/command, NOP
>> +             */
>> +            break;
>> +        }
>> +        break;
>> +    default:
>> +        /*
>> +         * This should not happen, but if it does, NOP
>> +         */
>> +        break;
>> +    }
>> +}
>> +
>> +static uint64_t erst_reg_read(void *opaque, hwaddr addr,
>> +                                unsigned size)
>> +{
>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
>> +    uint64_t val = 0;
>> +
>> +    switch (addr) {
>> +    case ERST_CSR_ACTION + 0:
>> +    case ERST_CSR_ACTION + 4:
>> +        val = erst_rd_reg64(addr, s->reg_action, size);
>> +        break;
>> +    case ERST_CSR_VALUE + 0:
>> +    case ERST_CSR_VALUE + 4:
>> +        val = erst_rd_reg64(addr, s->reg_value, size);
>> +        break;
>> +    default:
>> +        break;
>> +    }
>> +    trace_acpi_erst_reg_read(addr, val, size);
>> +    return val;
>> +}
>> +
>> +static const MemoryRegionOps erst_reg_ops = {
>> +    .read = erst_reg_read,
>> +    .write = erst_reg_write,
>> +    .endianness = DEVICE_NATIVE_ENDIAN,
>> +};
>> +
>> +static void erst_mem_write(void *opaque, hwaddr addr,
>> +    uint64_t val, unsigned size)
>> +{
>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> 
>> +    uint8_t *ptr = &s->record[addr - 0];
>> +    trace_acpi_erst_mem_write(addr, val, size);
>> +    switch (size) {
>> +    default:
>> +    case sizeof(uint8_t):
>> +        *(uint8_t *)ptr = (uint8_t)val;
>> +        break;
>> +    case sizeof(uint16_t):
>> +        *(uint16_t *)ptr = (uint16_t)val;
>> +        break;
>> +    case sizeof(uint32_t):
>> +        *(uint32_t *)ptr = (uint32_t)val;
>> +        break;
>> +    case sizeof(uint64_t):
>> +        *(uint64_t *)ptr = (uint64_t)val;
>> +        break;
>> +    }
>> +}
>> +
>> +static uint64_t erst_mem_read(void *opaque, hwaddr addr,
>> +                                unsigned size)
>> +{
>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
>> +    uint8_t *ptr = &s->record[addr - 0];
>> +    uint64_t val = 0;
>> +    switch (size) {
>> +    default:
>> +    case sizeof(uint8_t):
>> +        val = *(uint8_t *)ptr;
>> +        break;
>> +    case sizeof(uint16_t):
>> +        val = *(uint16_t *)ptr;
>> +        break;
>> +    case sizeof(uint32_t):
>> +        val = *(uint32_t *)ptr;
>> +        break;
>> +    case sizeof(uint64_t):
>> +        val = *(uint64_t *)ptr;
>> +        break;
>> +    }
>> +    trace_acpi_erst_mem_read(addr, val, size);
>> +    return val;
>> +}
>> +
>> +static const MemoryRegionOps erst_mem_ops = {
>> +    .read = erst_mem_read,
>> +    .write = erst_mem_write,
>> +    .endianness = DEVICE_NATIVE_ENDIAN,
>> +};
>> +
>> +/*******************************************************************/
>> +/*******************************************************************/
>> +
>> +static const VMStateDescription erst_vmstate  = {
>> +    .name = "acpi-erst",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT8(operation, ERSTDeviceState),
>> +        VMSTATE_UINT8(busy_status, ERSTDeviceState),
>> +        VMSTATE_UINT8(command_status, ERSTDeviceState),
>> +        VMSTATE_UINT32(record_offset, ERSTDeviceState),
>> +        VMSTATE_UINT32(record_count, ERSTDeviceState),
>> +        VMSTATE_UINT64(reg_action, ERSTDeviceState),
>> +        VMSTATE_UINT64(reg_value, ERSTDeviceState),
>> +        VMSTATE_UINT64(record_identifier, ERSTDeviceState),
>> +        VMSTATE_UINT8_ARRAY(record, ERSTDeviceState, ERST_RECORD_SIZE),
>> +        VMSTATE_UINT8_ARRAY(tmp_record, ERSTDeviceState, ERST_RECORD_SIZE),
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +
>> +static void erst_realizefn(PCIDevice *pci_dev, Error **errp)
>> +{
>> +    ERSTDeviceState *s = ACPIERST(pci_dev);
>> +    unsigned index = 0;
>> +    bool share;
>> +
>> +    trace_acpi_erst_realizefn_in();
>> +
>> +    if (!s->hostmem) {
>> +        error_setg(errp, "'" ACPI_ERST_MEMDEV_PROP "' property is not set");
>> +        return;
>> +    } else if (host_memory_backend_is_mapped(s->hostmem)) {
>> +        error_setg(errp, "can't use already busy memdev: %s",
>> +                   object_get_canonical_path_component(OBJECT(s->hostmem)));
>> +        return;
>> +    }
>> +
>> +    share = object_property_get_bool(OBJECT(s->hostmem), "share", &error_fatal);
> s/&error_fatal/errp/
Corrected

> 
>> +    if (!share) {
>> +        error_setg(errp, "ACPI ERST requires hostmem property share=on: %s",
>> +                   object_get_canonical_path_component(OBJECT(s->hostmem)));
>> +    }
> This limits possible to use backends to file|memfd only, so
> I wonder if really need this limitation, what if user doesn't
> care about preserving it across QEMU restarts. (i.e. usecase
> where storage is used as a means to troubleshoot guest crash
> i.e. QEMU is not restarted in between)
> 
> Maybe instead of enforcing we should just document that if user
> wishes to preserve content they should use file|memfd backend with
> share=on option.

I've removed this check. It is documented the way it is intended to be used.

> 
>> +
>> +    s->hostmem_mr = host_memory_backend_get_memory(s->hostmem);
>> +
>> +    /* HostMemoryBackend size will be multiple of PAGE_SIZE */
>> +    s->prop_size = object_property_get_int(OBJECT(s->hostmem), "size", &error_fatal);
> s/&error_fatal/errp/
Corrected

> 
>> +
>> +    /* Convert prop_size to integer multiple of ERST_RECORD_SIZE */
>> +    s->prop_size -= (s->prop_size % ERST_RECORD_SIZE);
> 
> pls, no fixups on behalf of user, if size is not what it should be
> error out with suggestion how to fix it.
Removed

> 
>> +
>> +    /*
>> +     * MemoryBackend initializes contents to zero, but we actually
>> +     * want contents initialized to 0xFF, ERST_INVALID_RECORD_ID.
>> +     */
>> +    if (copy_from_nvram_by_index(s, 0) == ACPI_ERST_STATUS_SUCCESS) {
>> +        if (s->tmp_record[0] == 0x00) {
>> +            memset(s->tmp_record, 0xFF, ERST_RECORD_SIZE);
> this doesn't scale,
> (set backend size to more than host physical RAM, put it on slow storage and have fun.)
of course, which is why i think we need to have an upper bound (my early
submissions did).

> 
> Is it possible to use 0 as invalid record id or change storage format
> so you would not have to rewrite whole file at startup (maybe some sort
no

> of metadata header/records book-keeping table before actual records.
> And initialize file only if header is invalid.)
I have to scan the backend storage anyway in order to initialize the record
count, so I've combined that scan with a test to see if the backend storage
needs to be initialized.

> 
>> +            index = 0;
>> +            while (copy_to_nvram_by_index(s, index) == ACPI_ERST_STATUS_SUCCESS) {
>> +                ++index;
>> +            }
> also back&forth copying here is not really necessary.
corrected

> 
>> +        }
>> +    }
>> +
>> +    /* Initialize record_count */
>> +    get_erst_record_count(s);
> why not put it into reset?
It is initialized once, then subsequent write/clear operations update
the counter as needed.

> 
>> +
>> +    /* BAR 0: Programming registers */
>> +    memory_region_init_io(&s->iomem, OBJECT(pci_dev), &erst_reg_ops, s,
>> +                          TYPE_ACPI_ERST, ERST_REG_SIZE);
>> +    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->iomem);
>> +
> 
>> +    /* BAR 1: Exchange buffer memory */
>> +    memory_region_init_io(&s->nvmem, OBJECT(pci_dev), &erst_mem_ops, s,
>> +                          TYPE_ACPI_ERST, ERST_RECORD_SIZE);
>> +    pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->nvmem);
> 
> **)
> instead of using mmio for buffer where each write causes
> guest exit to QEMU, map memory region directly to guest.
> see ivshmem_bar2, the only difference with ivshmem, you'd
> create memory region manually (for example you can use
> memory_region_init_resizeable_ram)
> 
> this way you can speedup access and drop erst_mem_ops and
> [tmp_]record intermediate buffers.
> 
> Instead of [tmp_]record you can copy record content
> directly between buffer and backend memory regions.

I've changed the exchange buffer into a MemoryBackend object and
eliminated the erst_mem_ops.

> 
>> +    /*
>> +     * The vmstate_register_ram_global() puts the memory in
>> +     * migration stream, where it is written back to the memory
>> +     * upon reaching the destination, which causes the backing
>> +     * file to be updated (with share=on).
>> +     */
>> +    vmstate_register_ram_global(s->hostmem_mr);
>> +
>> +    trace_acpi_erst_realizefn_out(s->prop_size);
>> +}
>> +
>> +static void erst_reset(DeviceState *dev)
>> +{
>> +    ERSTDeviceState *s = ACPIERST(dev);
>> +
>> +    trace_acpi_erst_reset_in(s->record_count);
>> +    s->operation = 0;
>> +    s->busy_status = 0;
>> +    s->command_status = ACPI_ERST_STATUS_SUCCESS;
> 
>> +    /* indicate empty/no-more until further notice */
> pls rephrase, I'm not sure what it's trying to say
Eliminated; I don't know why I was trying to say there either
> 
>> +    s->record_identifier = ERST_INVALID_RECORD_ID;
>> +    s->record_offset = 0;
>> +    s->next_record_index = 0;
> 
>> +    /* NOTE: record_count and nvram are initialized elsewhere */
>> +    trace_acpi_erst_reset_out(s->record_count);
>> +}
>> +
>> +static Property erst_properties[] = {
>> +    DEFINE_PROP_LINK(ACPI_ERST_MEMDEV_PROP, ERSTDeviceState, hostmem,
>> +                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void erst_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>> +
>> +    trace_acpi_erst_class_init_in();
>> +    k->realize = erst_realizefn;
>> +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
>> +    k->device_id = PCI_DEVICE_ID_REDHAT_ACPI_ERST;
>> +    k->revision = 0x00;
>> +    k->class_id = PCI_CLASS_OTHERS;
>> +    dc->reset = erst_reset;
>> +    dc->vmsd = &erst_vmstate;
>> +    dc->user_creatable = true;
>> +    device_class_set_props(dc, erst_properties);
>> +    dc->desc = "ACPI Error Record Serialization Table (ERST) device";
>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>> +    trace_acpi_erst_class_init_out();
>> +}
>> +
>> +static const TypeInfo erst_type_info = {
>> +    .name          = TYPE_ACPI_ERST,
>> +    .parent        = TYPE_PCI_DEVICE,
>> +    .class_init    = erst_class_init,
>> +    .instance_size = sizeof(ERSTDeviceState),
>> +    .interfaces = (InterfaceInfo[]) {
>> +        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> what is this for here?
> 
>> +        { }
>> +    }
>> +};
>> +
>> +static void erst_register_types(void)
>> +{
>> +    type_register_static(&erst_type_info);
>> +}
>> +
>> +type_init(erst_register_types)
>> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
>> index dd69577..262a8ee 100644
>> --- a/hw/acpi/meson.build
>> +++ b/hw/acpi/meson.build
>> @@ -4,6 +4,7 @@ acpi_ss.add(files(
>>     'aml-build.c',
>>     'bios-linker-loader.c',
>>     'utils.c',
>> +  'erst.c',
>>   ))
>>   acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu.c'))
>>   acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu_hotplug.c'))
> 


  reply	other threads:[~2021-07-21 16:09 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-30 19:07 [PATCH v5 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 01/10] ACPI ERST: bios-tables-test.c steps 1 and 2 Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 02/10] ACPI ERST: specification for ERST support Eric DeVolder
2021-06-30 19:26   ` Eric DeVolder
2021-07-19 15:02     ` Igor Mammedov
2021-07-21 15:42       ` Eric DeVolder
2021-07-26 10:06         ` Igor Mammedov
2021-07-26 19:52           ` Eric DeVolder
2021-07-27 11:45             ` Igor Mammedov
2021-07-28 15:16               ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 03/10] ACPI ERST: PCI device_id for ERST Eric DeVolder
2021-07-19 15:06   ` Igor Mammedov
2021-07-21 15:42     ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 04/10] ACPI ERST: header file " Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 05/10] ACPI ERST: support for ACPI ERST feature Eric DeVolder
2021-07-20 12:17   ` Igor Mammedov
2021-07-21 16:07     ` Eric DeVolder [this message]
2021-07-26 10:42       ` Igor Mammedov
2021-07-26 20:01         ` Eric DeVolder
2021-07-27 12:52           ` Igor Mammedov
2021-08-04 22:13             ` Eric DeVolder
2021-08-05  9:05               ` Igor Mammedov
2021-07-21 17:36     ` Eric DeVolder
2021-07-26 10:13       ` Igor Mammedov
2021-06-30 19:07 ` [PATCH v5 06/10] ACPI ERST: build the ACPI ERST table Eric DeVolder
2021-07-20 13:16   ` Igor Mammedov
2021-07-20 14:59     ` Igor Mammedov
2021-07-21 16:12     ` Eric DeVolder
2021-07-26 11:00       ` Igor Mammedov
2021-07-26 20:02         ` Eric DeVolder
2021-07-27 12:01           ` Igor Mammedov
2021-07-28 15:18             ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 07/10] ACPI ERST: trace support Eric DeVolder
2021-07-20 13:15   ` Igor Mammedov
2021-07-21 16:14     ` Eric DeVolder
2021-07-26 11:08       ` Igor Mammedov
2021-07-26 20:03         ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 08/10] ACPI ERST: create ACPI ERST table for pc/x86 machines Eric DeVolder
2021-07-20 13:19   ` Igor Mammedov
2021-07-21 16:16     ` Eric DeVolder
2021-07-26 11:30       ` Igor Mammedov
2021-07-26 20:03         ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 09/10] ACPI ERST: qtest for ERST Eric DeVolder
2021-07-20 13:38   ` Igor Mammedov
2021-07-21 16:18     ` Eric DeVolder
2021-07-26 11:45       ` Igor Mammedov
2021-07-26 20:06         ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 10/10] ACPI ERST: step 6 of bios-tables-test.c Eric DeVolder
2021-07-20 13:24   ` Igor Mammedov
2021-07-21 16:19     ` Eric DeVolder
2021-07-13 20:38 ` [PATCH v5 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Michael S. Tsirkin
2021-07-21 15:23   ` Eric DeVolder
2021-07-20 14:57 ` Igor Mammedov
2021-07-21 15:26   ` Eric DeVolder
2021-07-23 16:26   ` Eric DeVolder
2021-07-27 12:55   ` Igor Mammedov
2021-07-28 15:19     ` Eric DeVolder
2021-07-29  8:07       ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a09250e-4347-0ad6-d95c-0bfd2094b98b@oracle.com \
    --to=eric.devolder@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.