From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB797C433E0 for ; Tue, 16 Mar 2021 23:28:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9AE6A64F92 for ; Tue, 16 Mar 2021 23:28:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229524AbhCPX2U convert rfc822-to-8bit (ORCPT ); Tue, 16 Mar 2021 19:28:20 -0400 Received: from server.avery-design.com ([198.57.169.184]:52906 "EHLO server.avery-design.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229505AbhCPX2L (ORCPT ); Tue, 16 Mar 2021 19:28:11 -0400 Received: from ool-44c0a99c.dyn.optonline.net ([68.192.169.156]:55986 helo=[192.168.1.180]) by server.avery-design.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1lMJ5R-0005MW-Cb; Tue, 16 Mar 2021 23:26:53 +0000 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: [RFC PATCH 1/2] PCI/doe: Initial support PCI Data Object Exchange From: Chris Browy In-Reply-To: Date: Tue, 16 Mar 2021 19:26:50 -0400 Cc: Jonathan Cameron , linux-cxl@vger.kernel.org, Linux PCI , Ben Widawsky , Bjorn Helgaas , Linux ACPI , "Schofield, Alison" , Vishal L Verma , "Weiny, Ira" , Lorenzo Pieralisi , Linuxarm , Fangjian Reply-To: CAPcyv4h6hHCuO=0vHbPz2m4qw6-0=wW9swBrWimBsz6_GJu4Aw@mail.gmail.com Content-Transfer-Encoding: 8BIT Message-Id: <6F0B8DDD-E661-40C8-839B-1B77998EFF23@avery-design.com> References: <20210310180306.1588376-1-Jonathan.Cameron@huawei.com> <20210310180306.1588376-2-Jonathan.Cameron@huawei.com> <20210316162952.00001ab7@Huawei.com> To: Dan Williams X-Mailer: Apple Mail (2.3608.120.23.2.4) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.avery-design.com X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - avery-design.com X-Get-Message-Sender-Via: server.avery-design.com: authenticated_id: cbrowy@avery-design.com X-Authenticated-Sender: server.avery-design.com: cbrowy@avery-design.com X-Source: X-Source-Args: X-Source-Dir: Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Please address and clarify 2 queries below... > On Mar 16, 2021, at 2:14 PM, Dan Williams wrote: > > On Tue, Mar 16, 2021 at 9:31 AM Jonathan Cameron > wrote: >> >> On Mon, 15 Mar 2021 12:45:49 -0700 >> Dan Williams wrote: >> >>> Hey Jonathan, happy to see this, some comments below... >> >> Hi Dan, >> >> Thanks for taking a look! >> >>> >>> On Wed, Mar 10, 2021 at 10:08 AM Jonathan Cameron >>> wrote: >>>> >>>> Introduced in an ECN to the PCI 5.0, DOE provides a config space >>>> based mailbox with standard protocol discovery. Each mailbox >>>> is accessed through a DOE PCIE Extended Capability. >>>> >>>> A device may have 1 or more DOE mailboxes, each of which is allowed >>>> to support any number of protocols (some DOE protocols >>>> specifications apply additional restrictions). A given protocol >>>> may be supported on more than one DOE mailbox on a given function. >>> >>> Are all those protocol instances shared? >>> I'm trying to mental model >>> whether, for example, an auxiliary driver instance could be loaded per >>> DOE mailbox, or if there would need to be coordination of a given >>> protocol no matter how many DOE mailboxes on that device implemented >>> that protocol. >> >> Just to check I've understood corectly, you mean multiple instances of same >> protocol across different DOE mailboxes on a given device? >> > > Right. Could you confirm this case for clarity? A CXL device may have multiple VF/PF. For example, PF=0 could have one or more DOE instances for CDAT protocol. The driver will scan PF=0 for all DOE instances and finding one or more of CDAT protocol will combine/manage them. I had not considered multiple CDAT tables for single PF. For CXL devices with multiple PF’s the same process would be carried out on PF=1-N. > >> At DOE ECN level I don't think it is actually defined if they can >> interact or not. I've trawled though the released protocols that I know of >> to see if there is a consensus but not finding much information. >> >> I would argue however that there would be no reason to have the OS make >> use of more than one DOE mailbox for the same protocol. Bit fiddly to >> handle, but doesn't seem impossible to only register a protocol with first >> DOE that supports it. >> >> CMA does talk about use of multiple methods to communicate with the device >> and the need for results consistency. However that is referring to out of >> band vs DOE rather than multiple DOEs. Plus it isn't making statements >> about protocol coordination just responses to particular queries. >> >> Things might get crazy if you tried to do IDE setup from two different DOE >> mailboxes. The IDE ECN refers to "the specific instance of DOE used for..." >> implying I think that there might be multiple but software should only >> use one of them? >> >> My other gut feeling is that only some of the DOE mailboxes are ever going >> to be in the control of Linux. IDE calls out models where firmware or a TEE is >> responsible for it for example. I'm not sure how that is going to be communicated >> to the OS (can guess of course) >> >> Sub drivers are a plausible model that I'll think about some more - but >> for now it feels like too early to go that way.. > > Ok, fair enough. > >> >>> >>>> >>>> The current infrastructure is fairly simplistic and pushes the burden >>>> of handling this many-to-many relantionship to the drivers. In many >>> >>> s/relantionship/relationship/ >>> >>>> cases the arrangement will be static, making this straight forward. >>>> >>>> Open questions: >>>> * timeouts: The DOE specification allows for 1 second for some >>>> operations, but notes that specific protocols may have different >>>> requirements. Should we introduce the flexiblity now, or leave >>> >>> s/flexiblity/flexibility/ >> >> Gah. One day I'll remember to spell check. Sorry about that. >> >>> >>>> that to be implemented when support for such a protocol is added? >>> >>> If the timeout is property of the protocol then perhaps it should wait >>> and not be modeled at the transport level, but that's just an initial >>> reaction. I have not spent quality time with the DOE spec. >> >> I'm not sure it's possible to do so without breaking the abstraction of >> DOE request / response into a bunch of messy sub steps. Perhaps there is >> a clean way of doing it but I can't immediately think of it. >> >> If a protocol comes along that varies the timeout we can just add >> a parameter to say what it is on a call by call basis. > > Now that I've had a chance to take a look the spec seems to > unequivocally mandate the timeouts in "6.xx.1 Operation", where was > the per-protocol timeout implied? > >>>> * DOE mailboxes may use MSI / MSIX to signal that the have prepared >>>> a response. These require normal conditions are setup by the driver. >>>> Should we move some of this into the DOE support (such as ensuring >>>> bus mastering is enabled)? >>> >>> DOE support seems suitable to just be a library and leave the >>> host-device management to the host driver. >> >> Agreed. Though might be worth some debug checks. >> >> Speaking from experience it's easy to spend half a day wondering why your >> interrupts aren't turning up (I was blaming QEMU) because bus mastering >> wasn't enabled. > > Sure, no concern about validating assumptions in the library, but > leave control to the host. > >>>> Testing conducted against QEMU using: >>>> >>>> https://lore.kernel.org/qemu-devel/1612900760-7361-1-git-send-email-cbrowy@avery-design.com/ >>>> + fix for interrupt flag mentioned in that thread. >>>> >>> >>> I came across this the other day and made me wonder about SPDM >>> emulation as another test case: >>> >>> https://cfp.osfc.io/media/osfc2020/submissions/ECQ88N/resources/An_open_source_SPDM_implementation_for_secure_devi_kmIgAQe.pdf >> >> Nice! Looking at CMA / IDE emulation was on my todo list and that looks like >> it might make that job a lot easier. Would it be useful to integrate the openspdm’s SpdmResponderEmu.c onto the QEMU’s CXL Type3 Device’s DOE backend for CMA/IDE testing? Doesn’t look hard to do. >> >>> >>> >>>> Additional testing to be done, particularly around error handling. >>>> >>>> Signed-off-by: Jonathan Cameron >> >> Anything not commented on should be in v2. >> >>>> --- >>>> drivers/pci/pcie/Kconfig | 8 + >>>> drivers/pci/pcie/Makefile | 1 + >>>> drivers/pci/pcie/doe.c | 284 ++++++++++++++++++++++++++++++++++ >>>> include/linux/pcie-doe.h | 35 +++++ >>>> include/uapi/linux/pci_regs.h | 29 +++- >>>> 5 files changed, 356 insertions(+), 1 deletion(-) >>> >>>> diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile >>>> index b2980db88cc0..801fdd5fbfc1 100644 >>>> --- a/drivers/pci/pcie/Makefile >>>> +++ b/drivers/pci/pcie/Makefile >>>> @@ -13,3 +13,4 @@ obj-$(CONFIG_PCIE_PME) += pme.o >>>> obj-$(CONFIG_PCIE_DPC) += dpc.o >>>> obj-$(CONFIG_PCIE_PTM) += ptm.o >>>> obj-$(CONFIG_PCIE_EDR) += edr.o >>>> +obj-$(CONFIG_PCIE_DOE) += doe.o >>>> diff --git a/drivers/pci/pcie/doe.c b/drivers/pci/pcie/doe.c >>>> new file mode 100644 >>>> index 000000000000..b091ef379362 >>>> --- /dev/null >>>> +++ b/drivers/pci/pcie/doe.c >>>> @@ -0,0 +1,284 @@ >>>> +// SPDX-License-Identifier: GPL-2.0 >>>> +/* >>>> + * Data Object Exchange was added to the PCI spec as an ECN to 5.0. >>> >>> Perhaps just put the ECN link here? >> >> It's by number so I've left the title here as well as a link. > > Ok. > >> >>> >>>> + * >>>> + * Copyright (C) 2021 Huawei >>>> + * Jonathan Cameron >>>> + */ >>>> + >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> + >>>> +static irqreturn_t doe_irq(int irq, void *data) >>>> +{ >>>> + struct pcie_doe *doe = data; >>>> + struct pci_dev *pdev = doe->pdev; >>>> + u32 val; >>>> + >>>> + pci_read_config_dword(pdev, doe->cap_offset + PCI_DOE_STATUS, &val); >>>> + if (FIELD_GET(PCI_DOE_STATUS_INT_STATUS, val)) { >>>> + pci_write_config_dword(pdev, doe->cap_offset + PCI_DOE_STATUS, >>>> + val); >>>> + complete(&doe->c); >>>> + return IRQ_HANDLED; >>>> + } >>>> + /* Leave the error case to be handled outside irq */ >>>> + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) { >>>> + complete(&doe->c); >>>> + return IRQ_HANDLED; >>>> + } >>> >>> Only one DOE command can be outstanding at a time per PCI device? >> >> No, unless I'm missing something, that is one command per DOE mailbox at a time. >> The completion is part of the pcie_doe structure, not the pci_dev. >> That represents a single DOE mailbox. >> >> There can be multiple commands in flight to multiple DOE mailboxes. Not clear >> that there ever will be in real use cases however. >> >> This comes up later wrt to async operation. The mailbox only >> supports one request / response cycle at a time, they cannot be overlapped. > > "6.xx.1 Operation" says "If a single DOE instance supports multiple > data object protocols, system firmware/software is permitted to > interleave requests/responses with different data object protocols." > > ...although I must say I don't understand how system software tracks > which response belongs to which request if the transactions are > interleaved. > >>> This >>> seems insufficient in the multi-mailbox case / feels like there should >>> be a 'struct pcie_doe_request' object to track what it is to be >>> completed. >> >> No need for the complexity with one request / response in flight per >> mailbox at a time and each mailbox having separate state maintenance. > > I think the workqueue proposal removes the need for pcie_doe_request, > but still allows for the possibility of interleaving requests. > >> >>> >>>> + >>>> + return IRQ_NONE; >>>> +} >>>> + >>>> +static int pcie_doe_abort(struct pcie_doe *doe) >>>> +{ >>>> + struct pci_dev *pdev = doe->pdev; >>>> + int retry = 0; >>>> + u32 val; >>>> + >>>> + pci_write_config_dword(pdev, doe->cap_offset + PCI_DOE_CTRL, >>>> + PCI_DOE_CTRL_ABORT); >>>> + /* Abort is allowed to take up to 1 second */ >>>> + do { >>>> + retry++; >>>> + pci_read_config_dword(pdev, doe->cap_offset + PCI_DOE_STATUS, >>>> + &val); >>>> + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val) && >>>> + !FIELD_GET(PCI_DOE_STATUS_BUSY, val)) >>>> + return 0; >>>> + usleep_range(1000, 2000); >>>> + } while (retry < 1000); >>>> + >>>> + return -EIO; >>> >>> What's the state of the mailbox after an abort failure? >> >> Good question. I think the answer to that is dead device, reboot the machine >> or at least the device if you can do a hard enough slot reset. > > ...and hopefully that device is not part of an active interleave > otherwise a reset can take down "System RAM". > >> >> The specification goes with... >> "It is strongly recommend that implementations ensure that the functionality >> of the DOE Abort bit is resilient, including that DOE Abort functionality is >> maintained even in cases where device firmware is malfunctioning" > > Ok. > >> >> So cross our fingers everyone obeys that strong recommendation or try to >> work out what to do? > > What's the worst that can happen? > >> >>> >>>> +} >>>> + >>>> +/** >>>> + * pcie_doe_init() - Initialise a Data Object Exchange mailbox >>>> + * @doe: state structure for the DOE mailbox >>>> + * @pdev: pci device which has this DOE mailbox >>>> + * @doe_offset: offset in configuration space of the DOE extended capability. >>>> + * @use_int: whether to use the optional interrupt >>>> + * Returns: 0 on success, <0 on error >>>> + * >>>> + * Caller responsible for calling pci_alloc_irq_vectors() including DOE >>>> + * interrupt. >>>> + */ >>>> +int pcie_doe_init(struct pcie_doe *doe, struct pci_dev *pdev, int doe_offset, >>>> + bool use_int) >>>> +{ >>>> + u32 val; >>>> + int rc; >>>> + >>>> + mutex_init(&doe->lock); >>>> + init_completion(&doe->c); >>>> + doe->cap_offset = doe_offset; >>>> + doe->pdev = pdev; >>>> + /* Reset the mailbox by issuing an abort */ >>>> + rc = pcie_doe_abort(doe); >>>> + if (rc) >>>> + return rc; >>>> + >>>> + pci_read_config_dword(pdev, doe_offset + PCI_DOE_CAP, &val); >>>> + >>>> + if (use_int && FIELD_GET(PCI_DOE_CAP_INT, val)) { >>>> + rc = devm_request_irq(&pdev->dev, >>> >>> Lets not hide devm semantics from the caller, so at a minimum this >>> function should be called pcim_pcie_doe_init() to indicate to the >>> caller that it has placed something into the devm stack. However, this >>> may not be convenient for the caller. I'd leave it to the user to call >>> a pcie_doe() unregister routine via devm_add_action_or_reset() if it >>> wants. >> >>> >>> Lastly, I don't expect _init() routines to fail so perhaps split this >>> into pure "init" and "register" functionality? >> >> I'm a bit doubtful on naming of register() but will go with that for v2. >> >> It's not registering with anything so that feels a bit wrong as a description >> for part 2 of setup. Can leave that bike shedding for now though. >> > > Ok, just searching for a name that implies symmetrical teardown > register/unregister, enable/disable, ... etc. init/deinit doesn't do > it for me. > >>> >>>> + pci_irq_vector(pdev, >>>> + FIELD_GET(PCI_DOE_CAP_IRQ, val)), >>>> + doe_irq, 0, "DOE", doe); >>>> + if (rc) >>>> + return rc; >>>> + >>>> + doe->use_int = use_int; >>>> + pci_write_config_dword(pdev, doe_offset + PCI_DOE_CTRL, >>>> + FIELD_PREP(PCI_DOE_CTRL_INT_EN, 1)); >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> + >>>> +/** >>>> + * pcie_doe_exchange() - Send a request and receive a response >>>> + * @doe: DOE mailbox state structure >>>> + * @request: request data to be sent >>>> + * @request_sz: size of request in bytes >>>> + * @response: buffer into which to place the response >>>> + * @response_sz: size of available response buffer in bytes >>>> + * >>>> + * Return: 0 on success, < 0 on error >>>> + * Excess data will be discarded. >>>> + */ >>>> +int pcie_doe_exchange(struct pcie_doe *doe, u32 *request, size_t request_sz, >>>> + u32 *response, size_t response_sz) >>> >>> Are requests made against a specific protocol? >> >> Yes, but the descriptive header is very brea. >> >>> >>> This interface feels under-decorated for a public API for host-drivers to use. >> >> I'll see what I can come up with for v2. >> Likely to look something like >> >> int pcie_doe_exchange(struct pci_doe *doe, u16 vid, u8 type, >> u32 *request_pl, size_t request_pl_sz, >> u32 *response_pl, size_t response_pl_sz) > > I was thinking something like 'struct pcie_doe_object' pointers rather > than u32 arrays. > >> >> and return received length or negative on error. >> >> The disadvantage is that at least some of the specs just have the >> header as their first few DW. So there isn't a clear distinction >> between header and payload. May lead to people getting offsets wrong >> in a way they wouldn't do if driver was responsible for building the >> whole message. > > Aren't they more likely to get offsets wrong with u32 arrays rather > than data structures? > >> >>> >>>> +{ >>>> + struct pci_dev *pdev = doe->pdev; >>>> + int ret = 0; >>>> + int i; >>>> + u32 val; >>>> + int retry = -1; >>>> + size_t length; >>>> + >>>> + /* DOE requests must be a whole number of DW */ >>>> + if (request_sz % sizeof(u32)) >>>> + return -EINVAL; >>>> + >>>> + /* Need at least 2 DW to get the length */ >>>> + if (response_sz < 2 * sizeof(u32)) >>>> + return -EINVAL; >>>> + >>>> + mutex_lock(&doe->lock); >>>> + /* >>>> + * Check the DOE busy bit is not set. >>>> + * If it is set, this could indicate someone other than Linux is >>>> + * using the mailbox. >>>> + */ >>> >>> Ugh, makes me think we need to extend the support for blocking pci >>> device MMIO while a driver is attached to config-space as well. How >>> can a communication protocol work if initiators can trample each >>> other's state? >> >> Agreed. It is crazy. At very least we need a means of saying >> keep your hands off this DOE to the OS. >> >> We can't do it on a per protocol basis, which was what I was previously >> thinking, because we can't call the discovery protocol to see what >> a given DOE is for. > > I'm specifically thinking of a mechanism that blocks pci-sysfs from > initiating config-cycles if a driver has claimed that range. > > However, these MCTP to DOE tunnels that the SPDM presentation alluded > to make me nervous as there is no protocol to prevent an OS driver > agent and an MCTP agent from clobbering each other. > >> >>> >>>> + pci_read_config_dword(pdev, doe->cap_offset + PCI_DOE_STATUS, &val); >>>> + if (FIELD_GET(PCI_DOE_STATUS_BUSY, val)) { >>>> + ret = -EBUSY; >>>> + goto unlock; >>>> + } >>>> + >>>> + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) { >>>> + ret = pcie_doe_abort(doe); >>>> + if (ret) >>>> + goto unlock; >>>> + } >>>> + >>>> + for (i = 0; i < request_sz / 4; i++) >>>> + pci_write_config_dword(pdev, doe->cap_offset + PCI_DOE_WRITE, >>>> + request[i]); >>>> + >>>> + reinit_completion(&doe->c); >>>> + pci_write_config_dword(pdev, doe->cap_offset + PCI_DOE_CTRL, >>>> + PCI_DOE_CTRL_GO); >>>> + >>>> + if (doe->use_int) { >>>> + /* >>>> + * Timeout of 1 second from 6.xx.1 ECN - Data Object Exchange >>>> + * Note a protocol is allowed to specify a different timeout, so >>>> + * that may need supporting in future. >>>> + */ >>>> + if (!wait_for_completion_timeout(&doe->c, >>>> + msecs_to_jiffies(1000))) { >>> >>> s/msecs_to_jiffies(1000)/HZ/ >> >> huh. Missed that :) > > Yeah, the shorthand that X*HZ == "X seconds worth of jiffies" is just > something I picked up from other drivers not explicit documentation. > >> >>> >>>> + ret = -ETIMEDOUT; >>>> + goto unlock; >>>> + } >>>> + >>>> + pci_read_config_dword(pdev, >>>> + doe->cap_offset + PCI_DOE_STATUS, >>>> + &val); >>>> + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) { >>>> + pcie_doe_abort(doe); >>>> + ret = -EIO; >>>> + goto unlock; >>>> + } >>>> + } else { >>>> + do { >>>> + retry++; >>>> + pci_read_config_dword(pdev, >>>> + doe->cap_offset + PCI_DOE_STATUS, >>>> + &val); >>>> + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) { >>>> + pcie_doe_abort(doe); >>>> + ret = -EIO; >>>> + goto unlock; >>>> + } >>>> + >>>> + if (FIELD_GET(PCI_DOE_STATUS_DATA_OBJECT_READY, val)) >>>> + break; >>>> + usleep_range(1000, 2000); >>>> + } while (retry < 1000); >>>> + if (!FIELD_GET(PCI_DOE_STATUS_DATA_OBJECT_READY, val)) { >>>> + ret = -ETIMEDOUT; >>>> + goto unlock; >>> >>> Rather than a lock and polling loop I'd organize this as a single >>> threaded delayed_workqueue that periodically services requests or >>> immediately runs the workqueue upon receipt of an interrupt. This >>> provides a software queuing model that can optionally be treated as >>> async / sync depending on the use case. >> >> Given it's single element in flight I don't think there is any benefit >> to enabling async. The lock has to be held throughout anyway. >> It is always possible a particular caller wants to overlap this >> transaction with some other actions, but I'd rather put the burden >> on that clever caller which can spin this out to a thread of one type >> or another. >> >> We can revisit and split this in half if we have a user who benefits >> from the complexity. > > I don't think it's complex. I think it's simpler to rationalize than > this pattern of taking a lock and going to sleep with the lock held. > You can eliminate the lock completely if the only access to a given > DOE is a single dedicated kthread. There are other examples of this > single-thread protocol handler pattern in the kernel, like libsas SMP > protocol. > >>>> + } >>>> + } >>>> + >>>> + /* Read the first two dwords to get the length */ >>>> + pci_read_config_dword(pdev, doe->cap_offset + PCI_DOE_READ, >>>> + &response[0]); >>>> + pci_write_config_dword(pdev, doe->cap_offset + PCI_DOE_READ, 0); >>>> + pci_read_config_dword(pdev, doe->cap_offset + PCI_DOE_READ, >>>> + &response[1]); >>>> + pci_write_config_dword(pdev, doe->cap_offset + PCI_DOE_READ, 0); >>>> + length = FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_2_LENGTH, >>>> + response[1]); >>>> + if (length > SZ_1M) >> >> oops. That's exiting with mutex held. Fixed in v2. >> >>>> + return -EIO; >>>> + >>>> + for (i = 2; i < min(length, response_sz / 4); i++) { >>>> + pci_read_config_dword(pdev, doe->cap_offset + PCI_DOE_READ, >>>> + &response[i]); >>>> + pci_write_config_dword(pdev, doe->cap_offset + PCI_DOE_READ, 0); >>>> + } >>>> + /* flush excess length */ >>>> + for (; i < length; i++) { >>>> + pci_read_config_dword(pdev, doe->cap_offset + PCI_DOE_READ, >>>> + &val); >>>> + pci_write_config_dword(pdev, doe->cap_offset + PCI_DOE_READ, 0); >>>> + } >>>> + /* Final error check to pick up on any since Data Object Ready */ >>>> + pci_read_config_dword(pdev, doe->cap_offset + PCI_DOE_STATUS, &val); >>>> + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) { >>>> + pcie_doe_abort(doe); >>>> + ret = -EIO; >>>> + } >>>> +unlock: >>>> + mutex_unlock(&doe->lock); >>>> + >>>> + return ret; >>>> +} >>>> + >>>> + >>>> +static int pcie_doe_discovery(struct pcie_doe *doe, u8 *index, u16 *vid, u8 *protocol) >>>> +{ >>>> + u32 request[3] = { >>> >>> Should this be a proper struct with named fields rather than an array? >> >> Well the field names are going to end up as dw0 dw1 etc as there isn't a lot more >> meaningful to call them. We also want to keep them as u32 values throughout to >> avoid fiddly packing manipulation on different endian machines. > > The DOE object format has dedicated space for type and length. > > If anything the endian issue is more reason to have a proper data structure. > >> >> This becomes rather simpler when it's just the payload due to changes in the >> interface in v2. >> >>> >>>> + [0] = FIELD_PREP(PCI_DOE_DATA_OBJECT_HEADER_1_VID, 0001) | >>>> + FIELD_PREP(PCI_DOE_DATA_OBJECT_HEADER_1_TYPE, 0), >>>> + [1] = FIELD_PREP(PCI_DOE_DATA_OBJECT_HEADER_2_LENGTH, 3), >>>> + [2] = FIELD_PREP(PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX, *index) >>>> + }; >>>> + u32 response[3]; >>>> + int ret; >>>> + >>>> + ret = pcie_doe_exchange(doe, request, sizeof(request), response, sizeof(response)); >>>> + if (ret) >>>> + return ret; >>>> + >>>> + *vid = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID, response[2]); >>>> + *protocol = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL, response[2]); >>>> + *index = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX, response[2]); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +/** >>>> + * pcie_doe_protocol_check() - check if this DOE mailbox supports specific protocol >>>> + * @doe: DOE state structure >>>> + * @vid: Vendor ID >>>> + * @protocol: Protocol number as defined by Vendor >>>> + * Returns: 0 on success, <0 on error >>>> + */ >>>> +int pcie_doe_protocol_check(struct pcie_doe *doe, u16 vid, u8 protocol) >>> >>> Not clear to me that this is a comfortable API for a driver. I would >>> expect that at registration time all the supported protocols would be >>> retrieved and cached in the 'struct pcie_doe' context and then the >>> host driver could query from there without going back to the device >>> again. >> >> I'm not sure I follow. >> >> Any driver will fall into one of the following categories: >> a) Already knows what protocols are available on a >> given DOE instance perhaps because that's a characteristic of the hardware >> supported, in which case it has no reason to check (unless driver writer >> is paranoid) >> b) It has no way to know (e.g. class driver), then it makes sense to query >> the DOE instance to find out what protocols are available. > > I was more thinking that the public interface is a protocol rather > than the raw DOE. So the library knows CDAT, SPDM, IDE... and drivers > never need to query the interface. > > So this more of a question about where to draw the line of common code. > > For example in the nfit driver there is usage of: > > acpi_label_write() > > ...and: > > acpi_evaluate_dsm() > > ...where the former abstracts the protocol and the latter is the raw > interface. Both can write to a label area, but only one is idiomatic.