All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Francis <alistair23@gmail.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: bhelgaas@google.com, linux-pci@vger.kernel.org,
	Jonathan.Cameron@huawei.com, lukas@wunner.de,
	alex.williamson@redhat.com, christian.koenig@amd.com,
	kch@nvidia.com, gregkh@linuxfoundation.org, logang@deltatee.com,
	linux-kernel@vger.kernel.org, chaitanyak@nvidia.com,
	rdunlap@infradead.org,
	Alistair Francis <alistair.francis@wdc.com>
Subject: Re: [PATCH v9 2/3] PCI/DOE: Expose the DOE features via sysfs
Date: Fri, 3 Nov 2023 12:17:24 +1000	[thread overview]
Message-ID: <CAKmqyKOKX=Nh22RQ6rhKj1p7dpQCfYpVz_LL3MgTmcJ72O5ysQ@mail.gmail.com> (raw)
In-Reply-To: <20231019165829.GA1381099@bhelgaas>

On Fri, Oct 20, 2023 at 2:58 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, Oct 13, 2023 at 01:41:57PM +1000, Alistair Francis wrote:
> > The PCIe 6 specification added support for the Data Object Exchange (DOE).
> > When DOE is supported the DOE Discovery Feature must be implemented per
> > PCIe r6.1 sec 6.30.1.1. The protocol allows a requester to obtain
> > information about the other DOE features supported by the device.
> > ...
>
> > +static umode_t pci_doe_sysfs_attr_is_visible(struct kobject *kobj,
> > +                                          struct attribute *a, int n)
> > +{
> > +     struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
> > +     struct pci_doe_mb *doe_mb;
> > +     unsigned long index, j;
> > +     void *entry;
> > +
> > +     xa_for_each(&pdev->doe_mbs, index, doe_mb) {
> > +             xa_for_each(&doe_mb->feats, j, entry)
> > +                     return a->mode;
> > +     }
> > +
> > +     return 0;
>
> The nested loops that don't test anything look a little weird and
> maybe I'm missing something, but this looks like it returns a->mode if
> any mailbox with a feature exists, and 0 otherwise.
>
> Is that the same as this:
>
>   if (pdev->doe_mbs)
>     return a->mode;
>
>   return 0;
>
> since it sounds like a mailbox must support at least one feature?

I don't think this is the exact same.

pdev->doe_mbs exist (created by xa_init()) even if there are no
features supported.

I do think it's important we make sure DOE features exist before we
show the property.

>
> > +}
> > +
> > +static struct attribute *pci_dev_doe_feature_attrs[] = {
> > +     NULL,
> > +};
> > +
> > +const struct attribute_group pci_dev_doe_feature_group = {
> > +     .name   = "doe_features",
> > +     .attrs  = pci_dev_doe_feature_attrs,
> > +     .is_visible = pci_doe_sysfs_attr_is_visible,
> > +};
> > +
> > +static ssize_t pci_doe_sysfs_feature_show(struct device *dev,
> > +                                       struct device_attribute *attr,
> > +                                       char *buf)
> > +{
> > +     return sysfs_emit(buf, "%s\n", attr->attr.name);
> > +}
> > +
> > +static void pci_doe_sysfs_feature_remove(struct pci_dev *pdev,
> > +                                      struct pci_doe_mb *doe_mb)
> > +{
> > +     struct device *dev = &pdev->dev;
> > +     struct device_attribute *attrs = doe_mb->sysfs_attrs;
> > +     unsigned long i;
> > +     void *entry;
> > +
> > +     if (!attrs)
> > +             return;
> > +
> > +     doe_mb->sysfs_attrs = NULL;
> > +     xa_for_each(&doe_mb->feats, i, entry) {
> > +             if (attrs[i].show)
> > +                     sysfs_remove_file_from_group(&dev->kobj, &attrs[i].attr,
> > +                                                  pci_dev_doe_feature_group.name);
> > +             kfree(attrs[i].attr.name);
> > +     }
> > +     kfree(attrs);
> > +}
> > +
> > +static int pci_doe_sysfs_feature_populate(struct pci_dev *pdev,
> > +                                       struct pci_doe_mb *doe_mb)
> > +{
> > +     struct device *dev = &pdev->dev;
> > +     struct device_attribute *attrs;
> > +     unsigned long num_features = 0;
> > +     unsigned long vid, type;
> > +     unsigned long i;
> > +     void *entry;
> > +     int ret;
> > +
> > +     xa_for_each(&doe_mb->feats, i, entry)
> > +             num_features++;
> > +
> > +     attrs = kcalloc(num_features, sizeof(*attrs), GFP_KERNEL);
> > +     if (!attrs)
> > +             return -ENOMEM;
> > +
> > +     doe_mb->sysfs_attrs = attrs;
> > +     xa_for_each(&doe_mb->feats, i, entry) {
> > +             sysfs_attr_init(&attrs[i].attr);
> > +             vid = xa_to_value(entry) >> 8;
> > +             type = xa_to_value(entry) & 0xFF;
> > +             attrs[i].attr.name = kasprintf(GFP_KERNEL,
> > +                                            "0x%04lX:%02lX", vid, type);
>
> What's the rationale for using "0x" on the vendor ID but not on the
> type?  "0x1234:10" hints that the "10" might be decimal since it lacks
> "0x".
>
> Suggest lower-case "%04lx:%02lx" either way.

Fixed!

>
> FWIW, there's no "0x" prefix on the hex vendor IDs in "lspci -n"
> output and dmesg messages like this:
>
>   pci 0000:01:00.0: [10de:13b6] type 00
>
> > +             if (!attrs[i].attr.name) {
> > +                     ret = -ENOMEM;
> > +                     goto fail;
> > +             }
> > +
> > +             attrs[i].attr.mode = 0444;
> > +             attrs[i].show = pci_doe_sysfs_feature_show;
> > +
> > +             ret = sysfs_add_file_to_group(&dev->kobj, &attrs[i].attr,
> > +                                           pci_dev_doe_feature_group.name);
> > +             if (ret) {
> > +                     attrs[i].show = NULL;
> > +                     goto fail;
> > +             }
> > +     }
> > +
> > +     return 0;
> > +
> > +fail:
> > +     pci_doe_sysfs_feature_remove(pdev, doe_mb);
> > +     return ret;
>
> Not sure we should treat this failure this seriously.  Looks like it
> will prevent creation of the BAR resource files, and possibly even
> abort pci_sysfs_init() early.  I think the pci_dev will still be
> usable (lacking DOE sysfs) even if this fails.

I can change the call in pci_create_resource_files() to not return?

>
> > +}
> > +
> > +void pci_doe_sysfs_teardown(struct pci_dev *pdev)
> > +{
> > +     struct pci_doe_mb *doe_mb;
> > +     unsigned long index;
> > +
> > +     xa_for_each(&pdev->doe_mbs, index, doe_mb) {
> > +             pci_doe_sysfs_feature_remove(pdev, doe_mb);
> > +     }
> > +}
> > +
> > +int pci_doe_sysfs_init(struct pci_dev *pdev)
> > +{
> > +     struct pci_doe_mb *doe_mb;
> > +     unsigned long index;
> > +     int ret;
> > +
> > +     xa_for_each(&pdev->doe_mbs, index, doe_mb) {
> > +             ret = pci_doe_sysfs_feature_populate(pdev, doe_mb);
> > +             if (ret)
> > +                     return ret;
> > +     }
>
> I agree with Lukas that pci_create_resource_files() is not the right
> place to call this.
>
> I try hard to avoid calling *anything* from the
> pci_create_sysfs_dev_files() path because it has the nasty
> "sysfs_initialized" check and the associated pci_sysfs_init()
> initcall.
>
> It's so much cleaner when we can set up static attributes that are
> automatically added in the device_add() path.  I don't know whether
> that's possible.  I see lots of discussion with Greg KH that might be
> related, but I'm not sure.

I don't think it's possible, at least not that I or anyone else has
been able to figure out yet.

>
> I do know that we enumerate the mailboxes and features during
> pci_init_capabilities(), which happens before device_add(), so the
> information about which attributes should be present is at least
> *available* early enough:
>
>   pci_host_probe
>     pci_scan_root_bus_bridge
>       ...
>         pci_scan_single_device
>           pci_device_add
>             pci_init_capabilities
>               pci_doe_init
>                 while (pci_find_next_ext_capability(PCI_EXT_CAP_ID_DOE))
>                   pci_doe_create_mb
>                     pci_doe_cache_features
>                       pci_doe_discovery
>                       xa_insert(&doe_mb->feats)   <--
>             device_add
>               device_add_attrs
>                 device_add_groups
>     pci_bus_add_devices
>       pci_bus_add_device
>         pci_create_sysfs_dev_files
>           ...
>             pci_doe_sysfs_init                    <--
>               xa_for_each(&pdev->doe_mbs)
>                 pci_doe_sysfs_feature_populate
>                   xa_for_each(&doe_mb->feats)
>                     sysfs_add_file_to_group(pci_dev_doe_feature_group.name)
>
> Is it feasible to build an attribute group in pci_doe_init() and add
> it to dev->groups so device_add() will automatically add them?

That doesn't work as the sysfs_add_file_to_group() function will seg
fault when trying to find the parent as I don't think it exists yet.

[    0.767581] BUG: kernel NULL pointer dereference, address: 0000000000000008
[    0.767835] #PF: supervisor read access in kernel mode
[    0.767835] #PF: error_code(0x0000) - not-present page
[    0.767835] PGD 0 P4D 0
[    0.767835] Oops: 0000 [#1] PREEMPT SMP NOPTI
[    0.767835] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
6.6.0-10270-g5dda351a02c8-dirty #10
[    0.767835] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.16.2-14-g1e1da7a96300-prebuilt.qemu.org 04/01/2014
[    0.767835] RIP: 0010:kernfs_find_and_get_ns+0x10/0x70
[    0.767835] Code: 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 55 49 89 d5 41 54 49 89
f4 55 53 <48> 8b 0
[    0.767835] RSP: 0018:ffff96f9c00138a8 EFLAGS: 00000246
[    0.767835] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000030
[    0.767835] RDX: 0000000000000000 RSI: ffffffffafec53d9 RDI: 0000000000000000
[    0.767835] RBP: ffff957b4180e0b8 R08: 0000000000000000 R09: 0000000000ffff10
[    0.767835] R10: 0000000000000000 R11: ffffffffaf677c80 R12: ffffffffafec53d9
[    0.767835] R13: 0000000000000000 R14: ffff957b413c1ea0 R15: ffff957b4180e000
[    0.767835] FS:  0000000000000000(0000) GS:ffff957bbdc00000(0000)
knlGS:0000000000000000
[    0.767835] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.767835] CR2: 0000000000000008 CR3: 000000004442e000 CR4: 00000000000006f0
[    0.767835] Call Trace:
[    0.767835]  <TASK>
[    0.767835]  ? __die+0x1e/0x60
[    0.767835]  ? page_fault_oops+0x17c/0x470
[    0.767835]  ? search_module_extables+0x14/0x50
[    0.767835]  ? exc_page_fault+0x67/0x150
[    0.767835]  ? asm_exc_page_fault+0x26/0x30
[    0.767835]  ? __pfx_pci_mmcfg_read+0x10/0x10
[    0.767835]  ? kernfs_find_and_get_ns+0x10/0x70
[    0.767835]  ? kasprintf+0x5a/0x80
[    0.767835]  sysfs_add_file_to_group+0x4c/0x110
[    0.767835]  pci_doe_sysfs_init+0x13b/0x240
[    0.767835]  pci_device_add+0x1d7/0x620
[    0.767835]  pci_scan_single_device+0xc8/0x100
[    0.767835]  pci_scan_slot+0x6f/0x1e0
[    0.767835]  pci_scan_child_bus_extend+0x30/0x210
[    0.767835]  pci_scan_bridge_extend+0x5f4/0x710
[    0.767835]  pci_scan_child_bus_extend+0xc2/0x210
[    0.767835]  acpi_pci_root_create+0x283/0x2f0
[    0.767835]  pci_acpi_scan_root+0x199/0x200
[    0.767835]  acpi_pci_root_add+0x1ba/0x370
[    0.767835]  acpi_bus_attach+0x140/0x260
[    0.767835]  ? __pfx_acpi_dev_for_one_check+0x10/0x10
[    0.767835]  device_for_each_child+0x68/0xa0
[    0.767835]  acpi_dev_for_each_child+0x37/0x60
[    0.767835]  ? __pfx_acpi_bus_attach+0x10/0x10
[    0.767835]  acpi_bus_attach+0x21e/0x260
[    0.767835]  ? __pfx_acpi_dev_for_one_check+0x10/0x10
[    0.767835]  device_for_each_child+0x68/0xa0
[    0.767835]  acpi_dev_for_each_child+0x37/0x60
[    0.767835]  ? __pfx_acpi_bus_attach+0x10/0x10
[    0.767835]  acpi_bus_attach+0x21e/0x260
[    0.767835]  acpi_bus_scan+0x6b/0x1e0
[    0.767835]  acpi_scan_init+0xdc/0x290
[    0.767835]  acpi_init+0x22b/0x500
[    0.767835]  ? __pfx_acpi_init+0x10/0x10
[    0.767835]  do_one_initcall+0x56/0x220
[    0.767835]  kernel_init_freeable+0x19e/0x2d0
[    0.767835]  ? __pfx_kernel_init+0x10/0x10
[    0.767835]  kernel_init+0x15/0x1b0
[    0.767835]  ret_from_fork+0x2f/0x50
[    0.767835]  ? __pfx_kernel_init+0x10/0x10
[    0.767835]  ret_from_fork_asm+0x1b/0x30

I can move this to pci_create_sysfs_dev_files() instead if that's at
least better?

>
> It looks like __power_supply_register() does something like that:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/power/supply/power_supply_core.c?id=v6.5#n1375
>
> > --- a/include/linux/pci-doe.h
> > +++ b/include/linux/pci-doe.h
> > @@ -22,4 +22,6 @@ int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
> >           const void *request, size_t request_sz,
> >           void *response, size_t response_sz);
> >
> > +int pci_doe_sysfs_init(struct pci_dev *pci_dev);
> > +void pci_doe_sysfs_teardown(struct pci_dev *pdev);
>
> These declarations look like they should be in drivers/pci/pci.h as
> well.  I don't think anything outside the PCI core should need these.

I will move these.

Alistair

>
> Bjorn

  parent reply	other threads:[~2023-11-03  2:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-13  3:41 [PATCH v9 1/3] PCI/DOE: Rename DOE protocol to feature Alistair Francis
2023-10-13  3:41 ` [PATCH v9 2/3] PCI/DOE: Expose the DOE features via sysfs Alistair Francis
2023-10-17  8:34   ` Lukas Wunner
2023-11-03  1:27     ` Alistair Francis
2023-10-19 16:58   ` Bjorn Helgaas
2023-10-19 19:32     ` Lukas Wunner
2023-10-19 20:01       ` Bjorn Helgaas
2023-11-03  2:17     ` Alistair Francis [this message]
2023-10-13  3:41 ` [PATCH v9 3/3] PCI/DOE: Allow enabling DOE without CXL Alistair Francis
2023-10-18 22:24 ` [PATCH v9 1/3] PCI/DOE: Rename DOE protocol to feature Bjorn Helgaas
2023-11-03  0:18   ` Alistair Francis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKmqyKOKX=Nh22RQ6rhKj1p7dpQCfYpVz_LL3MgTmcJ72O5ysQ@mail.gmail.com' \
    --to=alistair23@gmail.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alex.williamson@redhat.com \
    --cc=alistair.francis@wdc.com \
    --cc=bhelgaas@google.com \
    --cc=chaitanyak@nvidia.com \
    --cc=christian.koenig@amd.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=kch@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=lukas@wunner.de \
    --cc=rdunlap@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.