* Linux 5.15-rc1 @ 2021-09-12 23:58 Linus Torvalds 2021-09-13 3:57 ` Guenter Roeck 2021-09-13 14:18 ` Dave Jones 0 siblings, 2 replies; 53+ messages in thread From: Linus Torvalds @ 2021-09-12 23:58 UTC (permalink / raw) To: Linux Kernel Mailing List So 5.15 isn't shaping up to be a particularly large release, at least in number of commits. At only just over 10k non-merge commits, this is in fact the smallest rc1 we have had in the 5.x series. We're usually hovering in the 12-14k commit range. That said, counting commits isn't necessarily the best measure, and that might be particularly true this time around. We have a few new subsystems, with NTFSv3 and ksmbd standing out. And as a result, when you look at the stats on a "lines changed" basis, 5.15-rc1 ends up looking much more middle-of-the-road. It still doesn't look like a particularly _big_ merge window, but also not remotely the smallest one. And while this is not up there with some larger releases, it's actually been one of the messier merge windows. Part of it was self-inflicted damage from me trying to enable -Werror much more aggressively, but I also ended up having to push back a lot more on some of the patch series and had a number o full requests where I went "ok, I've pulled this, but XYZ is wrong". So we've had merge windows that went much more smoothly. In fact, I have a pull request or two that I just didn't feel like going through fully, and I might still pull the upcoming week, but I got a bit fed up with how I ended up seeing new pull requests - and not for fixes - coming in fairly late in the merge window. Yes, the merge window is two weeks, but part of that is very literally to give _me_ time to actually look things through, not for people to send me new requests up until the very end of the merge window. Anyway, I'm hoping that things calm down, and I'll take a look at a few things still in my inbox, but on the whole you should expect that "that's it" and send me fixes only. And in order to get those fixes going, please go out and test this. Appended, as always, is my "mergelog" - since even at "only" 10k+ commits, the shortlog is not really realistically readable or useful as a summary. And as always, the mergelog credits the person I pulled from, which is not the same as the actual author of all the changes. There's just over a hundred people listed below that I've pulled from, but over 1500 people with authorship credit in the git tree. So that's where you'd need to dig for all the details. Thanks, Linus --- Al Viro (4): iov_iter fixes root filesystem type handling updates gfs2 setattr updates namei updates Alex Williamson (1): VFIO updates Alexandre Belloni (1): RTC updates Andreas Gruenbacher (1): gfs2 updates Andrew Morton (3): misc updates more updates yet more updates and hotfixes Anna Schumaker (1): NFS client updates Arnaldo Carvalho de Melo (2): perf tool updates more perf tools updates Arnd Bergmann (5): asm-generic updates ARM SoC updates ARM SoC driver updates ARM defconfig updates ARM SoC DT updates Bartosz Golaszewski (1): gpio updates Benson Leung (1): chrome platform updates Bjorn Andersson (1): remoteproc updates Bjorn Helgaas (1): PCI updates Borislav Petkov (8): EDAC updates RAS update x86 build updates x86 resource control updates x86 cleanups timer fix locking fixes scheduler fixes Casey Schaufler (1): smack updates Catalin Marinas (2): arm64 updates arm64 fixes Christian Brauner (4): move_mount updates close_range() cleanup idmapping documentation updates set_user() update Christoph Hellwig (2): configfs updates dma-mapping updates Chuck Lever (2): nfsd updates nfsd fixes Corey Minyard (1): IPMI updates Dan Williams (2): libnvdimm updates CXL (Compute Express Link) updates Daniel Lezcano (1): thermal updates Daniel Thompson (1): kgdb updates Darrick Wong (3): project quota update iomap updates xfs updates Dave Airlie (2): drm updates drm fixes David Hildenbrand (1): MAP_DENYWRITE removal David Howells (1): fscache updates David Sterba (2): btrfs updates btrfs fixes David Teigland (1): dlm updates Dmitry Torokhov (1): input updates Dominique Martinet (1): 9p updates Eric Biederman (2): siginfo si_trapno updates exit cleanups Eric Biggers (1): fscrypt updates Gao Xiang (1): erofs updates Geert Uytterhoeven (1): m68k updates Greg KH (8): char / misc driver updates driver core updates IIO and staging driver updates tty / serial updates USB / Thunderbolt updates more USB updates habanalabs updates misc driver fix Greg Ungerer (1): m68knommu updates Guenter Roeck (1): hwmon updates Hans de Goede (1): x86 platform driver updates Heiko Carstens (2): s390 updates more s390 updates Helge Deller (3): parisc architecture updates parisc architecture fixes parisc fixes Herbert Xu (1): crypto updates Ilya Dryomov (1): ceph updates Ingo Molnar (4): scheduler updates x86 perf event updates EFI updates memory model updates Jaegeuk Kim (1): f2fs updates Jakub Kicinski (2): networking updates networking fixes and stragglers James Bottomley (1): SCSI updates Jan Kara (4): fsnotify updates FIEMAP cleanups UDF and isofs updates fs hole punching vs cache filling race fixes Jarkko Sakkinen (1): tpm driver updates Jason Gunthorpe (2): rdma updates rdma fixes Jassi Brar (1): mailbox updates Jean Delvare (1): dmi fix Jeff Layton (1): file locking updates Jens Axboe (13): block updates block driver updates libata updates io_uring updates support for struct bio recycling io_uring mkdirat/symlinkat/linkat support io_uring fixes libata fixes CDROM maintainer update block fixes libata maintainer update block fixes io_uring fixes Jessica Yu (1): module updates Jiri Kosina (1): HID updates Joerg Roedel (2): iommu updates iommu fixes Jon Mason (1): NTB updates Jonathan Corbet (2): documentation updates more documentation updates Juergen Gross (1): xen updates Julia Lawall (1): coccinelle updates Kees Cook (1): hardening updates Konrad Rzeszutek Wilk (3): ibft updates swiotlb updates ibft fix Konstantin Komarov (1): NTFSv3 filesystem Lee Jones (2): MFD updates backlight updates Linus Walleij (1): pin control updates Mark Brown (3): regmap updates regulator updates spi updates Masahiro Yamada (1): Kbuild updates Mauro Carvalho Chehab (1): media updates Max Filippov (1): Xtensa updates Michael Ellerman (1): powerpc updates Michael Tsirkin (1): virtio updates Michal Simek (1): microblaze update Miguel Ojeda (2): auxdisplay updates compiler attributes updates Mike Rapoport (1): memblock updates Mike Snitzer (1): device mapper updates Miklos Szeredi (2): overlayfs update fuse updates Mimi Zohar (1): integrity subsystem updates Miquel Raynal (1): MTD updates Palmer Dabbelt (2): RISC-V updates more RISC-V updates Paolo Bonzini (1): KVM updates Paul McKenney (1): RCU updates Paul Moore (2): selinux update audit updates Pavel Machek (1): LED updates Petr Mladek (2): printk updates livepatching update Rafael Wysocki (7): power management updates ACPI updates device properties framework updates more ACPI updates more power management updates more power management updates more ACPI updates Richard Weinberger (1): UML updates Rob Herring (2): devicetree updates devicetree fixes Russell King (1): ARM development updates Sebastian Reichel (1): power supply and reset updates Shuah Khan (2): KUnit updates Kselftest updates Stafford Horne (1): OpenRISC updates Stefan Richter (1): firewire updates Stephen Boyd (2): clk updates clk fix Steve French (4): initial ksmbd implementation cifs client updates ksmbd fixes smbfs updates Steven Rostedt (3): tracing updates more tracing updates tracing fixes Takashi Iwai (2): sound updates sound fixes Ted Ts'o (1): ext4 updates Tejun Heo (2): cgroup updates workqueue updates Thierry Reding (1): pwm updates Thomas Bogendoerfer (1): MIPS updates Thomas Gleixner (9): debugobjects update SMP core updates locking and atomics updates irq updates x86 cache flush updates x86 PIRQ updates misc x86 updates timer updates CPU hotplug updates Ulf Hansson (1): MMC and MEMSTICK updates Vineet Gupta (1): ARC updates Vinod Koul (1): dmaengine updates Vlastimil Babka (1): SLUB updates Wei Liu (1): hyperv updates Wim Van Sebroeck (1): watchdog updates Wolfram Sang (1): i2c updates ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-12 23:58 Linux 5.15-rc1 Linus Torvalds @ 2021-09-13 3:57 ` Guenter Roeck 2021-09-28 23:18 ` Michael S. Tsirkin 2021-09-13 14:18 ` Dave Jones 1 sibling, 1 reply; 53+ messages in thread From: Guenter Roeck @ 2021-09-13 3:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote: [ ... ] > > And in order to get those fixes going, please go out and test this. > Build results: total: 153 pass: 143 fail: 10 Failed builds: alpha:allmodconfig (15) arm:allmodconfig (1) m68k:allmodconfig (47) mips:allmodconfig (2) parisc:allmodconfig (8) riscv32:allmodconfig (1) riscv:allmodconfig (1) s390:allmodconfig (6) sparc64:allmodconfig (1) xtensa:allmodconfig (53) Qemu test results: total: 479 pass: 478 fail: 1 Failed tests: sparc64:sun4u:nodebug:smp:virtio-pci:net,i82559er:hd The allmodconfig build failures are mostly if not all the result of enabling -Werror on test builds. The individual errors are too many to list; the number of errors is listed in () after the build above. Anyone interested can find build logs at https://kerneltests.org/builders. I am not entirely sure what to do about that - if history teaches a lesson, new build failures may pile up faster than existing ones get fixed, but then I am aware that several fixes are queued already. We'll see how it goes. I'll monitor the situation until maybe -rc4 or -rc5 and then decide if I disable -Werror in my test builds to catch any new "real" problems. The qemu runtime failure bisects to commit 694a1116b405 ("virtio: Bind virtio device to device-tree node"), and reverting that commit fixes the problem. With that patch applied, the virtio block device does not instantiate on sparc64. This results in a crash since that is where the test is trying to boot from. Good news is that I don't see any new runtime warnings. Guenter ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 3:57 ` Guenter Roeck @ 2021-09-28 23:18 ` Michael S. Tsirkin 2021-09-30 13:44 ` Guenter Roeck 0 siblings, 1 reply; 53+ messages in thread From: Michael S. Tsirkin @ 2021-09-28 23:18 UTC (permalink / raw) To: Guenter Roeck; +Cc: Linux Kernel Mailing List On Sun, Sep 12, 2021 at 08:57:50PM -0700, Guenter Roeck wrote: > The qemu runtime failure bisects to commit 694a1116b405 ("virtio: Bind > virtio device to device-tree node"), and reverting that commit fixes the > problem. With that patch applied, the virtio block device does not > instantiate on sparc64. This results in a crash since that is where the > test is trying to boot from. > > Good news is that I don't see any new runtime warnings. > > Guenter I think the fix is now merged by Linus - could you please try it out and confirm that it's ok? Thanks a lot for the testing! -- MST ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-28 23:18 ` Michael S. Tsirkin @ 2021-09-30 13:44 ` Guenter Roeck 0 siblings, 0 replies; 53+ messages in thread From: Guenter Roeck @ 2021-09-30 13:44 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Linux Kernel Mailing List On Tue, Sep 28, 2021 at 07:18:41PM -0400, Michael S. Tsirkin wrote: > On Sun, Sep 12, 2021 at 08:57:50PM -0700, Guenter Roeck wrote: > > The qemu runtime failure bisects to commit 694a1116b405 ("virtio: Bind > > virtio device to device-tree node"), and reverting that commit fixes the > > problem. With that patch applied, the virtio block device does not > > instantiate on sparc64. This results in a crash since that is where the > > test is trying to boot from. > > > > Good news is that I don't see any new runtime warnings. > > > > Guenter > > I think the fix is now merged by Linus - could you please try it out and > confirm that it's ok? Yes, it is. Thanks, Guenter ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-12 23:58 Linux 5.15-rc1 Linus Torvalds 2021-09-13 3:57 ` Guenter Roeck @ 2021-09-13 14:18 ` Dave Jones 2021-09-13 18:59 ` Heiner Kallweit 2021-09-13 23:46 ` Bjorn Helgaas 1 sibling, 2 replies; 53+ messages in thread From: Dave Jones @ 2021-09-13 14:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Heiner Kallweit On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote: > So 5.15 isn't shaping up to be a particularly large release, at least > in number of commits. At only just over 10k non-merge commits, this is > in fact the smallest rc1 we have had in the 5.x series. We're usually > hovering in the 12-14k commit range. This release takes over two minutes longer to boot on one my machines than 5.14. The time just seems to be unaccounted for, even with initcall_debug [ 2.190051] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) [ 2.190057] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3] [ 2.190239] acpi PNP0A08:00: _OSC: platform does not support [PME] [ 2.190360] acpi PNP0A08:00: _OSC: OS now controls [AER PCIeCapability LTR] [ 2.190362] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration [ 2.190783] PCI host bridge to bus 0000:00 [ 2.190785] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] [ 2.190787] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] [ 2.190789] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] [ 2.190790] pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000d3fff window] [ 2.190792] pci_bus 0000:00: root bus resource [mem 0x000d4000-0x000d7fff window] [ 2.190794] pci_bus 0000:00: root bus resource [mem 0x000d8000-0x000dbfff window] [ 2.190795] pci_bus 0000:00: root bus resource [mem 0x000dc000-0x000dffff window] [ 2.190797] pci_bus 0000:00: root bus resource [mem 0xbf200000-0xfeafffff window] [ 2.190799] pci_bus 0000:00: root bus resource [bus 00-fe] [ 2.190828] pci 0000:00:00.0: calling quirk_mmio_always_on+0x0/0x10 @ 1 [ 2.190833] pci 0000:00:00.0: quirk_mmio_always_on+0x0/0x10 took 0 usecs [ 2.190836] pci 0000:00:00.0: [8086:0c00] type 00 class 0x060000 [ 2.190851] pci 0000:00:00.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.190855] pci 0000:00:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.190950] pci 0000:00:01.0: calling quirk_no_aersid+0x0/0x10 @ 1 [ 2.190953] pci 0000:00:01.0: quirk_no_aersid+0x0/0x10 took 0 usecs [ 2.190955] pci 0000:00:01.0: [8086:0c01] type 01 class 0x060400 [ 2.190972] pci 0000:00:01.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.190975] pci 0000:00:01.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.190978] pci 0000:00:01.0: calling pci_fixup_transparent_bridge+0x0/0x20 @ 1 [ 2.190981] pci 0000:00:01.0: pci_fixup_transparent_bridge+0x0/0x20 took 0 usecs [ 2.190993] pci 0000:00:01.0: PME# supported from D0 D3hot D3cold [ 2.191233] pci 0000:00:02.0: [8086:0412] type 00 class 0x030000 [ 2.191240] pci 0000:00:02.0: reg 0x10: [mem 0xdf000000-0xdf3fffff 64bit] [ 2.191244] pci 0000:00:02.0: reg 0x18: [mem 0xc0000000-0xcfffffff 64bit pref] [ 2.191248] pci 0000:00:02.0: reg 0x20: [io 0xf000-0xf03f] [ 2.191256] pci 0000:00:02.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.191260] pci 0000:00:02.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.191339] pci 0000:00:03.0: [8086:0c0c] type 00 class 0x040300 [ 2.191345] pci 0000:00:03.0: reg 0x10: [mem 0xdfe34000-0xdfe37fff 64bit] [ 2.191356] pci 0000:00:03.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.191359] pci 0000:00:03.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.191450] pci 0000:00:14.0: [8086:8cb1] type 00 class 0x0c0330 [ 2.191463] pci 0000:00:14.0: reg 0x10: [mem 0xdfe20000-0xdfe2ffff 64bit] [ 2.191492] pci 0000:00:14.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.191495] pci 0000:00:14.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.191514] pci 0000:00:14.0: PME# supported from D3hot D3cold [ 2.191592] pci 0000:00:16.0: [8086:8cba] type 00 class 0x078000 [ 2.191606] pci 0000:00:16.0: reg 0x10: [mem 0xdfe3e000-0xdfe3e00f 64bit] [ 2.191637] pci 0000:00:16.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.191640] pci 0000:00:16.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.191660] pci 0000:00:16.0: PME# supported from D0 D3hot D3cold [ 2.191790] pci 0000:00:19.0: calling quirk_f0_vpd_link+0x0/0x60 @ 1 [ 2.191794] pci 0000:00:19.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs [ 2.191797] pci 0000:00:19.0: [8086:15a1] type 00 class 0x020000 [ 2.191807] pci 0000:00:19.0: reg 0x10: [mem 0xdfe00000-0xdfe1ffff] [ 2.191814] pci 0000:00:19.0: reg 0x14: [mem 0xdfe3c000-0xdfe3cfff] [ 2.191820] pci 0000:00:19.0: reg 0x18: [io 0xf080-0xf09f] [ 2.191844] pci 0000:00:19.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.191847] pci 0000:00:19.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.191869] pci 0000:00:19.0: PME# supported from D0 D3hot D3cold [ 2.191953] pci 0000:00:1a.0: [8086:8cad] type 00 class 0x0c0320 [ 2.191966] pci 0000:00:1a.0: reg 0x10: [mem 0xdfe3b000-0xdfe3b3ff] [ 2.192007] pci 0000:00:1a.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.192010] pci 0000:00:1a.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.192038] pci 0000:00:1a.0: PME# supported from D0 D3hot D3cold [ 2.192121] pci 0000:00:1b.0: [8086:8ca0] type 00 class 0x040300 [ 2.192132] pci 0000:00:1b.0: reg 0x10: [mem 0xdfe30000-0xdfe33fff 64bit] [ 2.192158] pci 0000:00:1b.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.192161] pci 0000:00:1b.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.192187] pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold [ 2.192313] pci 0000:00:1c.0: calling quirk_no_aersid+0x0/0x10 @ 1 [ 2.192316] pci 0000:00:1c.0: quirk_no_aersid+0x0/0x10 took 0 usecs [ 2.192318] pci 0000:00:1c.0: [8086:8c90] type 01 class 0x060400 [ 2.192353] pci 0000:00:1c.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.192356] pci 0000:00:1c.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.192359] pci 0000:00:1c.0: calling pci_fixup_transparent_bridge+0x0/0x20 @ 1 [ 2.192362] pci 0000:00:1c.0: pci_fixup_transparent_bridge+0x0/0x20 took 0 usecs [ 2.192387] pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold [ 2.192615] pci 0000:00:1c.3: calling quirk_no_aersid+0x0/0x10 @ 1 [ 2.192618] pci 0000:00:1c.3: quirk_no_aersid+0x0/0x10 took 0 usecs [ 2.192620] pci 0000:00:1c.3: [8086:8c96] type 01 class 0x060400 [ 2.192656] pci 0000:00:1c.3: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.192659] pci 0000:00:1c.3: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.192662] pci 0000:00:1c.3: calling pci_fixup_transparent_bridge+0x0/0x20 @ 1 [ 2.192664] pci 0000:00:1c.3: pci_fixup_transparent_bridge+0x0/0x20 took 0 usecs [ 2.192689] pci 0000:00:1c.3: PME# supported from D0 D3hot D3cold [ 2.192914] pci 0000:00:1c.4: calling quirk_no_aersid+0x0/0x10 @ 1 [ 2.192917] pci 0000:00:1c.4: quirk_no_aersid+0x0/0x10 took 0 usecs [ 2.192919] pci 0000:00:1c.4: [8086:8c98] type 01 class 0x060400 [ 2.192954] pci 0000:00:1c.4: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.192957] pci 0000:00:1c.4: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.192960] pci 0000:00:1c.4: calling pci_fixup_transparent_bridge+0x0/0x20 @ 1 [ 2.192962] pci 0000:00:1c.4: pci_fixup_transparent_bridge+0x0/0x20 took 0 usecs [ 2.192988] pci 0000:00:1c.4: PME# supported from D0 D3hot D3cold [ 2.193213] pci 0000:00:1c.6: calling quirk_no_aersid+0x0/0x10 @ 1 [ 2.193215] pci 0000:00:1c.6: quirk_no_aersid+0x0/0x10 took 0 usecs [ 2.193217] pci 0000:00:1c.6: [8086:8c9c] type 01 class 0x060400 [ 2.193253] pci 0000:00:1c.6: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.193256] pci 0000:00:1c.6: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.193259] pci 0000:00:1c.6: calling pci_fixup_transparent_bridge+0x0/0x20 @ 1 [ 2.193261] pci 0000:00:1c.6: pci_fixup_transparent_bridge+0x0/0x20 took 0 usecs [ 2.193286] pci 0000:00:1c.6: PME# supported from D0 D3hot D3cold [ 2.193515] pci 0000:00:1d.0: [8086:8ca6] type 00 class 0x0c0320 [ 2.193528] pci 0000:00:1d.0: reg 0x10: [mem 0xdfe3a000-0xdfe3a3ff] [ 2.193570] pci 0000:00:1d.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.193573] pci 0000:00:1d.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.193601] pci 0000:00:1d.0: PME# supported from D0 D3hot D3cold [ 2.193688] pci 0000:00:1f.0: [8086:8cc4] type 00 class 0x060100 [ 2.193754] pci 0000:00:1f.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.193757] pci 0000:00:1f.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.193855] pci 0000:00:1f.2: [8086:8c82] type 00 class 0x010601 [ 2.193864] pci 0000:00:1f.2: reg 0x10: [io 0xf0d0-0xf0d7] [ 2.193870] pci 0000:00:1f.2: reg 0x14: [io 0xf0c0-0xf0c3] [ 2.193875] pci 0000:00:1f.2: reg 0x18: [io 0xf0b0-0xf0b7] [ 2.193881] pci 0000:00:1f.2: reg 0x1c: [io 0xf0a0-0xf0a3] [ 2.193886] pci 0000:00:1f.2: reg 0x20: [io 0xf060-0xf07f] [ 2.193892] pci 0000:00:1f.2: reg 0x24: [mem 0xdfe39000-0xdfe397ff] [ 2.193901] pci 0000:00:1f.2: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.193904] pci 0000:00:1f.2: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.193923] pci 0000:00:1f.2: PME# supported from D3hot [ 2.193997] pci 0000:00:1f.3: [8086:8ca2] type 00 class 0x0c0500 [ 2.194010] pci 0000:00:1f.3: reg 0x10: [mem 0xdfe38000-0xdfe380ff 64bit] [ 2.194025] pci 0000:00:1f.3: reg 0x20: [io 0xf040-0xf05f] [ 2.194039] pci 0000:00:1f.3: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.194042] pci 0000:00:1f.3: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.194093] pci 0000:01:00.0: calling quirk_f0_vpd_link+0x0/0x60 @ 1 [ 2.194097] pci 0000:01:00.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs [ 2.194100] pci 0000:01:00.0: [8086:10fb] type 00 class 0x020000 [ 2.194109] pci 0000:01:00.0: reg 0x10: [mem 0xd0080000-0xd00fffff 64bit pref] [ 2.194113] pci 0000:01:00.0: reg 0x18: [io 0xe020-0xe03f] [ 2.194121] pci 0000:01:00.0: reg 0x20: [mem 0xd0104000-0xd0107fff 64bit pref] [ 2.194126] pci 0000:01:00.0: reg 0x30: [mem 0xdfd80000-0xdfdfffff pref] [ 2.194136] pci 0000:01:00.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 2.194139] pci 0000:01:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 2.194164] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold * stall here for 86 seconds * [ 88.675114] pci 0000:01:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref] [ 88.675118] pci 0000:01:00.0: VF(n) BAR0 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR0 for 64 VFs) [ 88.675125] pci 0000:01:00.0: reg 0x190: [mem 0x00000000-0x00003fff 64bit pref] [ 88.675127] pci 0000:01:00.0: VF(n) BAR3 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR3 for 64 VFs) [ 88.675252] pci 0000:01:00.1: calling quirk_f0_vpd_link+0x0/0x60 @ 1 [ 88.675255] pci 0000:01:00.1: quirk_f0_vpd_link+0x0/0x60 took 0 usecs [ 88.675258] pci 0000:01:00.1: [8086:10fb] type 00 class 0x020000 [ 88.675267] pci 0000:01:00.1: reg 0x10: [mem 0xd0000000-0xd007ffff 64bit pref] [ 88.675272] pci 0000:01:00.1: reg 0x18: [io 0xe000-0xe01f] [ 88.675280] pci 0000:01:00.1: reg 0x20: [mem 0xd0100000-0xd0103fff 64bit pref] [ 88.675284] pci 0000:01:00.1: reg 0x30: [mem 0xdfd00000-0xdfd7ffff pref] [ 88.675295] pci 0000:01:00.1: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 88.675298] pci 0000:01:00.1: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 88.675324] pci 0000:01:00.1: PME# supported from D0 D3hot D3cold * stall again for 98 seconds * [ 186.595114] pci 0000:01:00.1: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref] [ 186.595117] pci 0000:01:00.1: VF(n) BAR0 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR0 for 64 VFs) [ 186.595124] pci 0000:01:00.1: reg 0x190: [mem 0x00000000-0x00003fff 64bit pref] [ 186.595126] pci 0000:01:00.1: VF(n) BAR3 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR3 for 64 VFs) [ 186.595242] pci 0000:00:01.0: PCI bridge to [bus 01] [ 186.595245] pci 0000:00:01.0: bridge window [io 0xe000-0xefff] [ 186.595247] pci 0000:00:01.0: bridge window [mem 0xdfd00000-0xdfdfffff] [ 186.595249] pci 0000:00:01.0: bridge window [mem 0xd0000000-0xd01fffff 64bit pref] [ 186.595251] pci 0000:00:01.0: bridge has subordinate 01 but max busn 02 [ 186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601 [ 186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff] [ 186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref] [ 186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold [ 186.735107] pci 0000:02:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update [ 186.735140] pci 0000:02:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x2 link at 0000:00:1c.0 (capable of 16.000 Gb/s with 5.0 GT/s PCIe x4 link) [ 186.735255] pci 0000:00:1c.0: PCI bridge to [bus 02] [ 186.735260] pci 0000:00:1c.0: bridge window [mem 0xdfc00000-0xdfcfffff] [ 186.735310] pci 0000:03:00.0: [1b21:1187] type 01 class 0x060400 [ 186.735360] pci 0000:03:00.0: enabling Extended Tags [ 186.735422] pci 0000:03:00.0: PME# supported from D0 D3hot D3cold [ 186.735567] pci 0000:00:1c.3: PCI bridge to [bus 03-0b] [ 186.735570] pci 0000:00:1c.3: bridge window [io 0xb000-0xcfff] [ 186.735573] pci 0000:00:1c.3: bridge window [mem 0xdf400000-0xdf9fffff] [ 186.735649] pci 0000:04:01.0: [1b21:1187] type 01 class 0x060400 [ 186.735702] pci 0000:04:01.0: enabling Extended Tags [ 186.735762] pci 0000:04:01.0: PME# supported from D0 D3hot D3cold [ 186.735848] pci 0000:04:02.0: [1b21:1187] type 01 class 0x060400 [ 186.735901] pci 0000:04:02.0: enabling Extended Tags [ 186.735961] pci 0000:04:02.0: PME# supported from D0 D3hot D3cold [ 186.736048] pci 0000:04:03.0: [1b21:1187] type 01 class 0x060400 [ 186.736100] pci 0000:04:03.0: enabling Extended Tags [ 186.736160] pci 0000:04:03.0: PME# supported from D0 D3hot D3cold [ 186.736243] pci 0000:04:04.0: [1b21:1187] type 01 class 0x060400 [ 186.736296] pci 0000:04:04.0: enabling Extended Tags [ 186.736355] pci 0000:04:04.0: PME# supported from D0 D3hot D3cold [ 186.736439] pci 0000:04:05.0: [1b21:1187] type 01 class 0x060400 [ 186.736492] pci 0000:04:05.0: enabling Extended Tags [ 186.736551] pci 0000:04:05.0: PME# supported from D0 D3hot D3cold [ 186.736635] pci 0000:04:06.0: [1b21:1187] type 01 class 0x060400 [ 186.736688] pci 0000:04:06.0: enabling Extended Tags [ 186.736747] pci 0000:04:06.0: PME# supported from D0 D3hot D3cold [ 186.736830] pci 0000:04:07.0: [1b21:1187] type 01 class 0x060400 [ 186.736883] pci 0000:04:07.0: enabling Extended Tags [ 186.736943] pci 0000:04:07.0: PME# supported from D0 D3hot D3cold [ 186.737024] pci 0000:03:00.0: PCI bridge to [bus 04-0b] [ 186.737030] pci 0000:03:00.0: bridge window [io 0xb000-0xcfff] [ 186.737034] pci 0000:03:00.0: bridge window [mem 0xdf400000-0xdf9fffff] [ 186.737071] pci 0000:04:01.0: PCI bridge to [bus 05] [ 186.737153] pci 0000:06:00.0: [14e4:43b1] type 00 class 0x028000 [ 186.737183] pci 0000:06:00.0: reg 0x10: [mem 0xdf600000-0xdf607fff 64bit] [ 186.737202] pci 0000:06:00.0: reg 0x18: [mem 0xdf400000-0xdf5fffff 64bit] [ 186.737372] pci 0000:06:00.0: supports D1 D2 [ 186.737374] pci 0000:06:00.0: PME# supported from D0 D1 D2 D3hot D3cold [ 186.737516] pci 0000:04:02.0: PCI bridge to [bus 06] [ 186.737524] pci 0000:04:02.0: bridge window [mem 0xdf400000-0xdf6fffff] [ 186.737586] pci 0000:07:00.0: calling fixup_mpss_256+0x0/0x20 @ 1 [ 186.737590] pci 0000:07:00.0: fixup_mpss_256+0x0/0x20 took 0 usecs [ 186.737593] pci 0000:07:00.0: [1b21:0612] type 00 class 0x010601 [ 186.737617] pci 0000:07:00.0: reg 0x10: [io 0xc050-0xc057] [ 186.737630] pci 0000:07:00.0: reg 0x14: [io 0xc040-0xc043] [ 186.737643] pci 0000:07:00.0: reg 0x18: [io 0xc030-0xc037] [ 186.737657] pci 0000:07:00.0: reg 0x1c: [io 0xc020-0xc023] [ 186.737670] pci 0000:07:00.0: reg 0x20: [io 0xc000-0xc01f] [ 186.737683] pci 0000:07:00.0: reg 0x24: [mem 0xdf900000-0xdf9001ff] [ 186.737851] pci 0000:04:03.0: PCI bridge to [bus 07] [ 186.737857] pci 0000:04:03.0: bridge window [io 0xc000-0xcfff] [ 186.737860] pci 0000:04:03.0: bridge window [mem 0xdf900000-0xdf9fffff] [ 186.737899] pci 0000:04:04.0: PCI bridge to [bus 08] [ 186.737943] pci 0000:04:05.0: PCI bridge to [bus 09] [ 186.738025] pci 0000:0a:00.0: calling quirk_f0_vpd_link+0x0/0x60 @ 1 [ 186.738028] pci 0000:0a:00.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs [ 186.738031] pci 0000:0a:00.0: [8086:1539] type 00 class 0x020000 [ 186.738058] pci 0000:0a:00.0: reg 0x10: [mem 0xdf800000-0xdf81ffff] [ 186.738082] pci 0000:0a:00.0: reg 0x18: [io 0xb000-0xb01f] [ 186.738095] pci 0000:0a:00.0: reg 0x1c: [mem 0xdf820000-0xdf823fff] [ 186.738159] pci 0000:0a:00.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 [ 186.738162] pci 0000:0a:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs [ 186.738265] pci 0000:0a:00.0: PME# supported from D0 D3hot D3cold [ 186.738397] pci 0000:04:06.0: PCI bridge to [bus 0a] [ 186.738403] pci 0000:04:06.0: bridge window [io 0xb000-0xbfff] [ 186.738407] pci 0000:04:06.0: bridge window [mem 0xdf800000-0xdf8fffff] [ 186.738448] pci 0000:04:07.0: PCI bridge to [bus 0b] [ 186.738540] pci 0000:0c:00.0: calling fixup_mpss_256+0x0/0x20 @ 1 [ 186.738543] pci 0000:0c:00.0: fixup_mpss_256+0x0/0x20 took 0 usecs [ 186.738546] pci 0000:0c:00.0: [1b21:0612] type 00 class 0x010601 [ 186.738563] pci 0000:0c:00.0: reg 0x10: [io 0xd050-0xd057] [ 186.738572] pci 0000:0c:00.0: reg 0x14: [io 0xd040-0xd043] [ 186.738582] pci 0000:0c:00.0: reg 0x18: [io 0xd030-0xd037] [ 186.738591] pci 0000:0c:00.0: reg 0x1c: [io 0xd020-0xd023] [ 186.738601] pci 0000:0c:00.0: reg 0x20: [io 0xd000-0xd01f] [ 186.738610] pci 0000:0c:00.0: reg 0x24: [mem 0xdfb00000-0xdfb001ff] [ 186.738799] pci 0000:00:1c.4: PCI bridge to [bus 0c] [ 186.738802] pci 0000:00:1c.4: bridge window [io 0xd000-0xdfff] [ 186.738805] pci 0000:00:1c.4: bridge window [mem 0xdfb00000-0xdfbfffff] [ 186.738855] pci 0000:0d:00.0: [1b21:1142] type 00 class 0x0c0330 [ 186.738881] pci 0000:0d:00.0: reg 0x10: [mem 0xdfa00000-0xdfa07fff 64bit] [ 186.739001] pci 0000:0d:00.0: PME# supported from D3cold [ 186.739136] pci 0000:00:1c.6: PCI bridge to [bus 0d] [ 186.739140] pci 0000:00:1c.6: bridge window [mem 0xdfa00000-0xdfafffff] [ 186.739160] pci_bus 0000:00: on NUMA node 0 * boot seems to continue as normal from here * I did a bisect, which landed in the VPD rework that Heiner landed in this merge window. Specifically it ended up on ... git bisect start # good: [7d2a07b769330c34b4deabeed939325c77a7ec2f] Linux 5.14 git bisect good 7d2a07b769330c34b4deabeed939325c77a7ec2f # bad: [6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f] Linux 5.15-rc1 git bisect bad 6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f # good: [1b4f3dfb4792f03b139edf10124fcbeb44e608e6] Merge tag 'usb-serial-5.15-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next git bisect good 1b4f3dfb4792f03b139edf10124fcbeb44e608e6 # good: [c793011242d182e5f12800c12dbaf37af80be735] Merge tag 'pinctrl-v5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect good c793011242d182e5f12800c12dbaf37af80be735 # good: [5e6a5845dd651b00754a62edec2f0a439182024d] Merge tag 'gpio-updates-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux git bisect good 5e6a5845dd651b00754a62edec2f0a439182024d # bad: [0f4b9289bad354b606190a4cd54d5222b4e41d98] Merge tag 'docs-5.15-2' of git://git.lwn.net/linux git bisect bad 0f4b9289bad354b606190a4cd54d5222b4e41d98 # good: [626bf91a292e2035af5b9d9cce35c5c138dfe06d] Merge tag 'net-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net git bisect good 626bf91a292e2035af5b9d9cce35c5c138dfe06d # bad: [49832c819ab85b33b7a2a1429c8d067e82be2977] Makefile: use -Wno-main in the full kernel tree git bisect bad 49832c819ab85b33b7a2a1429c8d067e82be2977 # bad: [739c4747a25af3a2571f69e4709f2b476a17d5d4] Merge branch 'pci/misc' git bisect bad 739c4747a25af3a2571f69e4709f2b476a17d5d4 # bad: [2c208abd4f9efac02622d8f3c9989f4b7b1ad973] PCI/VPD: Use unaligned access helpers git bisect bad 2c208abd4f9efac02622d8f3c9989f4b7b1ad973 # bad: [f240e15097c5004811a58f2cbc170bf90d06d0a9] tg3: Read VPD with pci_vpd_alloc() git bisect bad f240e15097c5004811a58f2cbc170bf90d06d0a9 # good: [d27f7344ba89897d0ce6ebe0c9eecbe214f9bb8f] PCI/VPD: Reorder pci_read_vpd(), pci_write_vpd() git bisect good d27f7344ba89897d0ce6ebe0c9eecbe214f9bb8f # bad: [fe7568cf2f2dc3a0783f6ffdb3802c1ce2085466] PCI/VPD: Treat invalid VPD like missing VPD capability git bisect bad fe7568cf2f2dc3a0783f6ffdb3802c1ce2085466 # good: [22ff2bcec704a7a8c43a998251e0757cd2de66e1] PCI/VPD: Remove struct pci_vpd.valid member git bisect good 22ff2bcec704a7a8c43a998251e0757cd2de66e1 # bad: [7bac54497c3e3b2ca37b7043f1fa78586540f10e] PCI/VPD: Determine VPD size in pci_vpd_init() git bisect bad 7bac54497c3e3b2ca37b7043f1fa78586540f10e # good: [fd00faa375fbb9d46ae0730d0faf4a3006301005] PCI/VPD: Embed struct pci_vpd in struct pci_dev git bisect good fd00faa375fbb9d46ae0730d0faf4a3006301005 # first bad commit: [7bac54497c3e3b2ca37b7043f1fa78586540f10e] PCI/VPD: Determine VPD size in pci_vpd_init() 7bac54497c3e3b2ca37b7043f1fa78586540f10e is the first bad commit commit 7bac54497c3e3b2ca37b7043f1fa78586540f10e Author: Heiner Kallweit <hkallweit1@gmail.com> Date: Sun Aug 8 19:22:52 2021 +0200 PCI/VPD: Determine VPD size in pci_vpd_init() Determine VPD size in pci_vpd_init(). Quirks set dev->vpd.len to a non-zero value, so they cause us to skip the dynamic size calculation. Prerequisite is that we move the quirks from FINAL to HEADER so they are run before pci_vpd_init(). Link: https://lore.kernel.org/r/cc4a6538-557a-294d-4f94-e6d1d3c91589@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Which unfortunately doesn't revert cleanly I can't test it reverted in isolation. My guess is there's something quirky about the PCI bus on this machine that causes stalls until we hit timeout, but I'm not sure where to begin debugging this. There's nothing too exotic on this machine: 00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06) 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06) 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) 00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06) 00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller 00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V 00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2 00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller 00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0) 00:1c.3 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 4 (rev d0) 00:1c.4 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 5 (rev d0) 00:1c.6 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 7 (rev d0) 00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1 00:1f.0 ISA bridge: Intel Corporation Z97 Chipset LPC Controller 00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] 00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) 03:00.0 PCI bridge: ASMedia Technology Inc. Device 1187 04:01.0 PCI bridge: ASMedia Technology Inc. Device 1187 04:02.0 PCI bridge: ASMedia Technology Inc. Device 1187 04:03.0 PCI bridge: ASMedia Technology Inc. Device 1187 04:04.0 PCI bridge: ASMedia Technology Inc. Device 1187 04:05.0 PCI bridge: ASMedia Technology Inc. Device 1187 04:06.0 PCI bridge: ASMedia Technology Inc. Device 1187 04:07.0 PCI bridge: ASMedia Technology Inc. Device 1187 06:00.0 Network controller: Broadcom Inc. and subsidiaries BCM4352 802.11ac Wireless Network Adapter (rev 03) 07:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02) 0a:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) 0c:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02) 0d:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller Dave ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 14:18 ` Dave Jones @ 2021-09-13 18:59 ` Heiner Kallweit 2021-09-13 19:51 ` Linus Torvalds 2021-09-13 20:15 ` Dave Jones 2021-09-13 23:46 ` Bjorn Helgaas 1 sibling, 2 replies; 53+ messages in thread From: Heiner Kallweit @ 2021-09-13 18:59 UTC (permalink / raw) To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas Cc: linux-pci On 13.09.2021 16:18, Dave Jones wrote: > [ 186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601 > [ 186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff] > [ 186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref] > [ 186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold > [ 186.735107] pci 0000:02:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update Thanks for the report! The stalls may be related to this one. Device is: 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) With an older kernel you may experience the stall when accessing the vpd attribute of this device in sysfs. Maybe the device indicates VPD capability but doesn't actually support it. Could you please provide the "lspci -vv" output for this device? And could you please test with the following applied to verify the assumption? It disables VPD access for this device. --- drivers/pci/vpd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c index 517789205..fc92e880e 100644 --- a/drivers/pci/vpd.c +++ b/drivers/pci/vpd.c @@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd); /* * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class. -- 2.33.0 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 18:59 ` Heiner Kallweit @ 2021-09-13 19:51 ` Linus Torvalds 2021-09-13 20:09 ` Bjorn Helgaas 2021-09-13 20:11 ` Heiner Kallweit 2021-09-13 20:15 ` Dave Jones 1 sibling, 2 replies; 53+ messages in thread From: Linus Torvalds @ 2021-09-13 19:51 UTC (permalink / raw) To: Heiner Kallweit Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci On Mon, Sep 13, 2021 at 12:00 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > With an older kernel you may experience the stall when accessing the vpd > attribute of this device in sysfs. Honestly, that old behavior seems to be the *much* better behavior. A synchronous stall at boot time is truly annoying, and a pain to deal with (and debug). That pci_vpd_read() function is clearly NOT designed to deal with boot-time callers in the first place, so I think that commit is simply wrong. And yes, I see that "128ms timeout". If it was _one_ timeout, that would be one thing,. But it looks like it's repeated over and over. Not acceptable at boot time. Not at all. Bjorn. Please revert. Or I can do it. Linus ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 19:51 ` Linus Torvalds @ 2021-09-13 20:09 ` Bjorn Helgaas 2021-09-13 20:11 ` Heiner Kallweit 1 sibling, 0 replies; 53+ messages in thread From: Bjorn Helgaas @ 2021-09-13 20:09 UTC (permalink / raw) To: Linus Torvalds Cc: Heiner Kallweit, Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci On Mon, Sep 13, 2021 at 12:51:30PM -0700, Linus Torvalds wrote: > On Mon, Sep 13, 2021 at 12:00 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > > > With an older kernel you may experience the stall when accessing the vpd > > attribute of this device in sysfs. > > Honestly, that old behavior seems to be the *much* better behavior. > > A synchronous stall at boot time is truly annoying, and a pain to deal > with (and debug). > > That pci_vpd_read() function is clearly NOT designed to deal with > boot-time callers in the first place, so I think that commit is simply > wrong. > > And yes, I see that "128ms timeout". If it was _one_ timeout, that > would be one thing,. But it looks like it's repeated over and over. > > Not acceptable at boot time. Not at all. > > Bjorn. Please revert. Or I can do it. Sorry about this. Dave said it wasn't a trivial revert, but I'll be happy to work with Heiner and Dave to revert and test it. I agree that we shouldn't read VPD at boot-time unless we actually need the data then. Bjorn ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 19:51 ` Linus Torvalds 2021-09-13 20:09 ` Bjorn Helgaas @ 2021-09-13 20:11 ` Heiner Kallweit 2021-09-13 20:15 ` Linus Torvalds 1 sibling, 1 reply; 53+ messages in thread From: Heiner Kallweit @ 2021-09-13 20:11 UTC (permalink / raw) To: Linus Torvalds Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci On 13.09.2021 21:51, Linus Torvalds wrote: > On Mon, Sep 13, 2021 at 12:00 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: >> >> With an older kernel you may experience the stall when accessing the vpd >> attribute of this device in sysfs. > > Honestly, that old behavior seems to be the *much* better behavior. > > A synchronous stall at boot time is truly annoying, and a pain to deal > with (and debug). > > That pci_vpd_read() function is clearly NOT designed to deal with > boot-time callers in the first place, so I think that commit is simply > wrong. > > And yes, I see that "128ms timeout". If it was _one_ timeout, that > would be one thing,. But it looks like it's repeated over and over. > No. The timeout is not the issue, otherwise you would see the message "VPD access failed.." over and over again. The issue here seems to be that this call in PCI config space access to adress vpd->cap + PCI_VPD_ADDR stalls. In a first place this seems to be due to a buggy device. We'll know for sure once Dave checks with the test patch applied. To deal with such buggy devices we have the VPD blacklist quirk. Secondly you could blame the PCI subsystem for not detecting stalled access to a buggy device. However I don't know the PCIe spec good enough to really comment on this. > Not acceptable at boot time. Not at all. > > Bjorn. Please revert. Or I can do it. > > Linus > Heiner ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 20:11 ` Heiner Kallweit @ 2021-09-13 20:15 ` Linus Torvalds 0 siblings, 0 replies; 53+ messages in thread From: Linus Torvalds @ 2021-09-13 20:15 UTC (permalink / raw) To: Heiner Kallweit Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci On Mon, Sep 13, 2021 at 1:11 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > No. The timeout is not the issue, otherwise you would see the message > "VPD access failed.." over and over again. Ahh, I did check that that error did exist, but you're right, it's not there all the time. > The issue here seems to be > that this call in PCI config space access to adress > vpd->cap + PCI_VPD_ADDR stalls. Ok. Regardless, we shouldn't do this since the boot code shouldn't need any of the VPD information. Linus ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 18:59 ` Heiner Kallweit 2021-09-13 19:51 ` Linus Torvalds @ 2021-09-13 20:15 ` Dave Jones 2021-09-13 20:22 ` Heiner Kallweit 2021-09-13 20:32 ` Heiner Kallweit 1 sibling, 2 replies; 53+ messages in thread From: Dave Jones @ 2021-09-13 20:15 UTC (permalink / raw) To: Heiner Kallweit Cc: Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci On Mon, Sep 13, 2021 at 08:59:49PM +0200, Heiner Kallweit wrote: > On 13.09.2021 16:18, Dave Jones wrote: > > [ 186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601 > > [ 186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff] > > [ 186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref] > > [ 186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold > > [ 186.735107] pci 0000:02:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update > > Thanks for the report! The stalls may be related to this one. Device is: > 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) > > With an older kernel you may experience the stall when accessing the vpd > attribute of this device in sysfs. > > Maybe the device indicates VPD capability but doesn't actually support it. > Could you please provide the "lspci -vv" output for this device? 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) (prog-if 01 [AHCI 1.0]) Subsystem: Samsung Electronics Co Ltd XP941 PCIe SSD Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 5: Memory at dfc10000 (32-bit, non-prefetchable) [size=8K] Expansion ROM at dfc00000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (ok), Width x2 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+ 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [d0] Vital Product Data Not readable Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [140 v1] Device Serial Number 00-00-00-00-00-00-00-00 Capabilities: [150 v1] Power Budgeting <?> Capabilities: [160 v1] Latency Tolerance Reporting Max snoop latency: 71680ns Max no snoop latency: 71680ns Kernel driver in use: ahci > And could you please test with the following applied to verify the > assumption? It disables VPD access for this device. > > --- > drivers/pci/vpd.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c > index 517789205..fc92e880e 100644 > --- a/drivers/pci/vpd.c > +++ b/drivers/pci/vpd.c > @@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd); > /* > * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port > * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class. This didn't help I'm afraid :( It changed the VPD warning, but that's about it... [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs Dave ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 20:15 ` Dave Jones @ 2021-09-13 20:22 ` Heiner Kallweit 2021-09-13 20:32 ` Dave Jones 2021-09-13 20:32 ` Heiner Kallweit 1 sibling, 1 reply; 53+ messages in thread From: Heiner Kallweit @ 2021-09-13 20:22 UTC (permalink / raw) To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci On 13.09.2021 22:15, Dave Jones wrote: > On Mon, Sep 13, 2021 at 08:59:49PM +0200, Heiner Kallweit wrote: > > On 13.09.2021 16:18, Dave Jones wrote: > > > [ 186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601 > > > [ 186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff] > > > [ 186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref] > > > [ 186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold > > > [ 186.735107] pci 0000:02:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update > > > > Thanks for the report! The stalls may be related to this one. Device is: > > 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) > > > > With an older kernel you may experience the stall when accessing the vpd > > attribute of this device in sysfs. > > > > Maybe the device indicates VPD capability but doesn't actually support it. > > Could you please provide the "lspci -vv" output for this device? > > 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) (prog-if 01 [AHCI 1.0]) > Subsystem: Samsung Electronics Co Ltd XP941 PCIe SSD > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 16 > Region 5: Memory at dfc10000 (32-bit, non-prefetchable) [size=8K] > Expansion ROM at dfc00000 [disabled] [size=64K] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [70] Express (v2) Endpoint, MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- > LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us > ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 5GT/s (ok), Width x2 (downgraded) > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+ > 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- > EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- > FRS- TPHComp- ExtTPHComp- > AtomicOpsCap: 32bit- 64bit- 128bitCAS- > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, > AtomicOpsCtl: ReqEn- > LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS- > LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- > EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- > Retimer- 2Retimers- CrosslinkRes: unsupported > Capabilities: [d0] Vital Product Data > Not readable > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ > AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- > MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- > HeaderLog: 00000000 00000000 00000000 00000000 > Capabilities: [140 v1] Device Serial Number 00-00-00-00-00-00-00-00 > Capabilities: [150 v1] Power Budgeting <?> > Capabilities: [160 v1] Latency Tolerance Reporting > Max snoop latency: 71680ns > Max no snoop latency: 71680ns > Kernel driver in use: ahci > > > > And could you please test with the following applied to verify the > > assumption? It disables VPD access for this device. > > > > --- > > drivers/pci/vpd.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c > > index 517789205..fc92e880e 100644 > > --- a/drivers/pci/vpd.c > > +++ b/drivers/pci/vpd.c > > @@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd); > > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd); > > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd); > > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd); > > /* > > * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port > > * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class. > > > This didn't help I'm afraid :( > It changed the VPD warning, but that's about it... > > [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 > [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) > [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs > With this patch there's no VPD access to this device any longer. So this can't be the root cause. Do you have any other PCI device that has VPD capability? -> Capabilities: [...] Vital Product Data > > Dave > Heiner ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 20:22 ` Heiner Kallweit @ 2021-09-13 20:32 ` Dave Jones 2021-09-13 20:44 ` [Intel-wired-lan] " Heiner Kallweit 2021-09-13 20:59 ` Linux 5.15-rc1 Heiner Kallweit 0 siblings, 2 replies; 53+ messages in thread From: Dave Jones @ 2021-09-13 20:32 UTC (permalink / raw) To: Heiner Kallweit Cc: Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote: > > This didn't help I'm afraid :( > > It changed the VPD warning, but that's about it... > > > > [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 > > [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) > > [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs > > > With this patch there's no VPD access to this device any longer. So this can't be > the root cause. Do you have any other PCI device that has VPD capability? > -> Capabilities: [...] Vital Product Data 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) Subsystem: Device 1dcf:030a ... Capabilities: [e0] Vital Product Data Unknown small resource type 06, will not decode more. I'll add that to the quirk list and see if that helps. Dave ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-13 20:32 ` Dave Jones @ 2021-09-13 20:44 ` Heiner Kallweit 2021-09-13 20:59 ` Linux 5.15-rc1 Heiner Kallweit 1 sibling, 0 replies; 53+ messages in thread From: Heiner Kallweit @ 2021-09-13 20:44 UTC (permalink / raw) To: Jesse Brandeburg, Tony Nguyen Cc: linux-pci, Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas, Dave Jones, intel-wired-lan On 13.09.2021 22:32, Dave Jones wrote: + Jesse and Tony as Intel NIC maintainers > On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote: > > > > This didn't help I'm afraid :( > > > It changed the VPD warning, but that's about it... > > > > > > [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 > > > [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) > > > [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs > > > > > With this patch there's no VPD access to this device any longer. So this can't be > > the root cause. Do you have any other PCI device that has VPD capability? > > -> Capabilities: [...] Vital Product Data > > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) > Subsystem: Device 1dcf:030a > ... > Capabilities: [e0] Vital Product Data > Unknown small resource type 06, will not decode more. > When searching I found the same symptom of invalid VPD data for 82599EB. Do these adapters have non-VPD data in VPD address space? Or is the actual VPD data at another offset than 0? I know that few Chelsio devices have such a non-standard VPD structure. > > I'll add that to the quirk list and see if that helps. > > Dave > Heiner ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-13 20:44 ` Heiner Kallweit 0 siblings, 0 replies; 53+ messages in thread From: Heiner Kallweit @ 2021-09-13 20:44 UTC (permalink / raw) To: intel-wired-lan On 13.09.2021 22:32, Dave Jones wrote: + Jesse and Tony as Intel NIC maintainers > On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote: > > > > This didn't help I'm afraid :( > > > It changed the VPD warning, but that's about it... > > > > > > [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 > > > [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) > > > [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs > > > > > With this patch there's no VPD access to this device any longer. So this can't be > > the root cause. Do you have any other PCI device that has VPD capability? > > -> Capabilities: [...] Vital Product Data > > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) > Subsystem: Device 1dcf:030a > ... > Capabilities: [e0] Vital Product Data > Unknown small resource type 06, will not decode more. > When searching I found the same symptom of invalid VPD data for 82599EB. Do these adapters have non-VPD data in VPD address space? Or is the actual VPD data at another offset than 0? I know that few Chelsio devices have such a non-standard VPD structure. > > I'll add that to the quirk list and see if that helps. > > Dave > Heiner ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-13 20:44 ` [Intel-wired-lan] " Heiner Kallweit @ 2021-09-13 23:32 ` Hisashi T Fujinaka -1 siblings, 0 replies; 53+ messages in thread From: Hisashi T Fujinaka @ 2021-09-13 23:32 UTC (permalink / raw) To: Heiner Kallweit Cc: Jesse Brandeburg, Tony Nguyen, Dave Jones, linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas, Linus Torvalds, todd.fujinaka On Mon, 13 Sep 2021, Heiner Kallweit wrote: > On 13.09.2021 22:32, Dave Jones wrote: > > + Jesse and Tony as Intel NIC maintainers > >> On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote: >> >> >> This didn't help I'm afraid :( >> >> It changed the VPD warning, but that's about it... >> >> >> >> [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 >> >> [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) >> >> [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs >> >> >> > With this patch there's no VPD access to this device any longer. So this can't be >> > the root cause. Do you have any other PCI device that has VPD capability? >> > -> Capabilities: [...] Vital Product Data >> >> >> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) >> Subsystem: Device 1dcf:030a >> ... >> Capabilities: [e0] Vital Product Data >> Unknown small resource type 06, will not decode more. >> > > When searching I found the same symptom of invalid VPD data for 82599EB. > Do these adapters have non-VPD data in VPD address space? Or is the actual > VPD data at another offset than 0? I know that few Chelsio devices have > such a non-standard VPD structure. > >> >> I'll add that to the quirk list and see if that helps. >> >> Dave >> > Heiner Sorry to reply from my personal account. If I did it from my work account I'd be top-posting because of Outlook and that goes over like a lead balloon. Anyway, can you send us a dump of your eeprom using ethtool -e? You can either send it via a bug on e1000.sourceforge.net or try sending it to todd.fujinaka@intel.com The other thing is I'm wondering is what the subvendor device ID you have is referring to because it's not in the pci database. Some ODMs like getting creative with what they put in the NVM. Todd Fujinaka (todd.fujinaka@intel.com) ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-13 23:32 ` Hisashi T Fujinaka 0 siblings, 0 replies; 53+ messages in thread From: Hisashi T Fujinaka @ 2021-09-13 23:32 UTC (permalink / raw) To: intel-wired-lan On Mon, 13 Sep 2021, Heiner Kallweit wrote: > On 13.09.2021 22:32, Dave Jones wrote: > > + Jesse and Tony as Intel NIC maintainers > >> On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote: >> >> >> This didn't help I'm afraid :( >> >> It changed the VPD warning, but that's about it... >> >> >> >> [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 >> >> [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) >> >> [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs >> >> >> > With this patch there's no VPD access to this device any longer. So this can't be >> > the root cause. Do you have any other PCI device that has VPD capability? >> > -> Capabilities: [...] Vital Product Data >> >> >> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) >> Subsystem: Device 1dcf:030a >> ... >> Capabilities: [e0] Vital Product Data >> Unknown small resource type 06, will not decode more. >> > > When searching I found the same symptom of invalid VPD data for 82599EB. > Do these adapters have non-VPD data in VPD address space? Or is the actual > VPD data at another offset than 0? I know that few Chelsio devices have > such a non-standard VPD structure. > >> >> I'll add that to the quirk list and see if that helps. >> >> Dave >> > Heiner Sorry to reply from my personal account. If I did it from my work account I'd be top-posting because of Outlook and that goes over like a lead balloon. Anyway, can you send us a dump of your eeprom using ethtool -e? You can either send it via a bug on e1000.sourceforge.net or try sending it to todd.fujinaka at intel.com The other thing is I'm wondering is what the subvendor device ID you have is referring to because it's not in the pci database. Some ODMs like getting creative with what they put in the NVM. Todd Fujinaka (todd.fujinaka at intel.com) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-13 23:32 ` Hisashi T Fujinaka @ 2021-09-14 5:51 ` Heiner Kallweit -1 siblings, 0 replies; 53+ messages in thread From: Heiner Kallweit @ 2021-09-14 5:51 UTC (permalink / raw) To: Dave Jones Cc: Jesse Brandeburg, Tony Nguyen, linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas, todd.fujinaka, Hisashi T Fujinaka On 14.09.2021 01:32, Hisashi T Fujinaka wrote: > On Mon, 13 Sep 2021, Heiner Kallweit wrote: > >> On 13.09.2021 22:32, Dave Jones wrote: >> >> + Jesse and Tony as Intel NIC maintainers >> >>> On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote: >>> >>> >> This didn't help I'm afraid :( >>> >> It changed the VPD warning, but that's about it... >>> >> >>> >> [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 >>> >> [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) >>> >> [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs >>> >> >>> > With this patch there's no VPD access to this device any longer. So this can't be >>> > the root cause. Do you have any other PCI device that has VPD capability? >>> > -> Capabilities: [...] Vital Product Data >>> >>> >>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) >>> Subsystem: Device 1dcf:030a >>> ... >>> Capabilities: [e0] Vital Product Data >>> Unknown small resource type 06, will not decode more. >>> >> >> When searching I found the same symptom of invalid VPD data for 82599EB. >> Do these adapters have non-VPD data in VPD address space? Or is the actual >> VPD data at another offset than 0? I know that few Chelsio devices have >> such a non-standard VPD structure. >> >>> >>> I'll add that to the quirk list and see if that helps. >>> >>> Dave >>> >> Heiner > > Sorry to reply from my personal account. If I did it from my work > account I'd be top-posting because of Outlook and that goes over like a > lead balloon. > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can > either send it via a bug on e1000.sourceforge.net or try sending it to > todd.fujinaka@intel.com > > The other thing is I'm wondering is what the subvendor device ID you > have is referring to because it's not in the pci database. Some ODMs > like getting creative with what they put in the NVM. > > Todd Fujinaka (todd.fujinaka@intel.com) Thanks for the prompt reply. Dave, could you please provide the requested information? ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-14 5:51 ` Heiner Kallweit 0 siblings, 0 replies; 53+ messages in thread From: Heiner Kallweit @ 2021-09-14 5:51 UTC (permalink / raw) To: intel-wired-lan On 14.09.2021 01:32, Hisashi T Fujinaka wrote: > On Mon, 13 Sep 2021, Heiner Kallweit wrote: > >> On 13.09.2021 22:32, Dave Jones wrote: >> >> + Jesse and Tony as Intel NIC maintainers >> >>> On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote: >>> >>> >> This didn't help I'm afraid :( >>> >> It changed the VPD warning, but that's about it... >>> >> >>> >> [? 184.235496] pci 0000:02:00.0: calling? quirk_blacklist_vpd+0x0/0x22 @ 1 >>> >> [? 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) >>> >> [? 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs >>> >> >>> > With this patch there's no VPD access to this device any longer. So this can't be >>> > the root cause. Do you have any other PCI device that has VPD capability? >>> > -> Capabilities: [...] Vital Product Data >>> >>> >>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) >>> ??????? Subsystem: Device 1dcf:030a >>> ????... >>> ??????????? Capabilities: [e0] Vital Product Data >>> ??????????????? Unknown small resource type 06, will not decode more. >>> >> >> When searching I found the same symptom of invalid VPD data for 82599EB. >> Do these adapters have non-VPD data in VPD address space? Or is the actual >> VPD data at another offset than 0? I know that few Chelsio devices have >> such a non-standard VPD structure. >> >>> >>> I'll add that to the quirk list and see if that helps. >>> >>> ????Dave >>> >> Heiner > > Sorry to reply from my personal account. If I did it from my work > account I'd be top-posting because of Outlook and that goes over like a > lead balloon. > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can > either send it via a bug on e1000.sourceforge.net or try sending it to > todd.fujinaka at intel.com > > The other thing is I'm wondering is what the subvendor device ID you > have is referring to because it's not in the pci database. Some ODMs > like getting creative with what they put in the NVM. > > Todd Fujinaka (todd.fujinaka at intel.com) Thanks for the prompt reply. Dave, could you please provide the requested information? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-14 5:51 ` Heiner Kallweit @ 2021-09-14 14:24 ` Dave Jones -1 siblings, 0 replies; 53+ messages in thread From: Dave Jones @ 2021-09-14 14:24 UTC (permalink / raw) To: Heiner Kallweit Cc: Jesse Brandeburg, Tony Nguyen, linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas, todd.fujinaka, Hisashi T Fujinaka On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: > > Sorry to reply from my personal account. If I did it from my work > > account I'd be top-posting because of Outlook and that goes over like a > > lead balloon. > > > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can > > either send it via a bug on e1000.sourceforge.net or try sending it to > > todd.fujinaka@intel.com > > > > The other thing is I'm wondering is what the subvendor device ID you > > have is referring to because it's not in the pci database. Some ODMs > > like getting creative with what they put in the NVM. > > > > Todd Fujinaka (todd.fujinaka@intel.com) > > Thanks for the prompt reply. Dave, could you please provide the requested > information? sent off-list. Dave ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-14 14:24 ` Dave Jones 0 siblings, 0 replies; 53+ messages in thread From: Dave Jones @ 2021-09-14 14:24 UTC (permalink / raw) To: intel-wired-lan On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: > > Sorry to reply from my personal account. If I did it from my work > > account I'd be top-posting because of Outlook and that goes over like a > > lead balloon. > > > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can > > either send it via a bug on e1000.sourceforge.net or try sending it to > > todd.fujinaka at intel.com > > > > The other thing is I'm wondering is what the subvendor device ID you > > have is referring to because it's not in the pci database. Some ODMs > > like getting creative with what they put in the NVM. > > > > Todd Fujinaka (todd.fujinaka at intel.com) > > Thanks for the prompt reply. Dave, could you please provide the requested > information? sent off-list. Dave ^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-14 14:24 ` Dave Jones @ 2021-09-14 18:28 ` Fujinaka, Todd -1 siblings, 0 replies; 53+ messages in thread From: Fujinaka, Todd @ 2021-09-14 18:28 UTC (permalink / raw) To: Dave Jones, Heiner Kallweit Cc: Brandeburg, Jesse, Nguyen, Anthony L, linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas, Hisashi T Fujinaka Wow, I'm waiting for the hardware guy to look at this but this is an off-brand 10Gtek NIC from Amazon that just has nonsense data in the VPD as far as I can tell (3's). I'll let you know if I find out that the critical settings are bad, but I would just ignore any VPD errors. Todd Fujinaka Software Application Engineer Ethernet Products Group Intel Corporation todd.fujinaka@intel.com -----Original Message----- From: Dave Jones <davej@codemonkey.org.uk> Sent: Tuesday, September 14, 2021 7:24 AM To: Heiner Kallweit <hkallweit1@gmail.com> Cc: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; linux-pci@vger.kernel.org; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; intel-wired-lan <intel-wired-lan@lists.osuosl.org>; Bjorn Helgaas <bhelgaas@google.com>; Fujinaka, Todd <todd.fujinaka@intel.com>; Hisashi T Fujinaka <htodd@twofifty.com> Subject: Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: > > Sorry to reply from my personal account. If I did it from my work > > account I'd be top-posting because of Outlook and that goes over like a > > lead balloon. > > > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can > > either send it via a bug on e1000.sourceforge.net or try sending it to > > todd.fujinaka@intel.com > > > > The other thing is I'm wondering is what the subvendor device ID you > > have is referring to because it's not in the pci database. Some ODMs > > like getting creative with what they put in the NVM. > > > > Todd Fujinaka (todd.fujinaka@intel.com) > > Thanks for the prompt reply. Dave, could you please provide the requested > information? sent off-list. Dave ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-14 18:28 ` Fujinaka, Todd 0 siblings, 0 replies; 53+ messages in thread From: Fujinaka, Todd @ 2021-09-14 18:28 UTC (permalink / raw) To: intel-wired-lan Wow, I'm waiting for the hardware guy to look at this but this is an off-brand 10Gtek NIC from Amazon that just has nonsense data in the VPD as far as I can tell (3's). I'll let you know if I find out that the critical settings are bad, but I would just ignore any VPD errors. Todd Fujinaka Software Application Engineer Ethernet Products Group Intel Corporation todd.fujinaka at intel.com -----Original Message----- From: Dave Jones <davej@codemonkey.org.uk> Sent: Tuesday, September 14, 2021 7:24 AM To: Heiner Kallweit <hkallweit1@gmail.com> Cc: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; linux-pci at vger.kernel.org; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; intel-wired-lan <intel-wired-lan@lists.osuosl.org>; Bjorn Helgaas <bhelgaas@google.com>; Fujinaka, Todd <todd.fujinaka@intel.com>; Hisashi T Fujinaka <htodd@twofifty.com> Subject: Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: > > Sorry to reply from my personal account. If I did it from my work > > account I'd be top-posting because of Outlook and that goes over like a > > lead balloon. > > > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can > > either send it via a bug on e1000.sourceforge.net or try sending it to > > todd.fujinaka at intel.com > > > > The other thing is I'm wondering is what the subvendor device ID you > > have is referring to because it's not in the pci database. Some ODMs > > like getting creative with what they put in the NVM. > > > > Todd Fujinaka (todd.fujinaka at intel.com) > > Thanks for the prompt reply. Dave, could you please provide the requested > information? sent off-list. Dave ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-14 14:24 ` Dave Jones @ 2021-09-14 20:00 ` Hisashi T Fujinaka -1 siblings, 0 replies; 53+ messages in thread From: Hisashi T Fujinaka @ 2021-09-14 20:00 UTC (permalink / raw) To: Dave Jones Cc: Heiner Kallweit, linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas On Tue, 14 Sep 2021, Dave Jones wrote: > On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: > > > > Sorry to reply from my personal account. If I did it from my work > > > account I'd be top-posting because of Outlook and that goes over like a > > > lead balloon. > > > > > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can > > > either send it via a bug on e1000.sourceforge.net or try sending it to > > > todd.fujinaka@intel.com > > > > > > The other thing is I'm wondering is what the subvendor device ID you > > > have is referring to because it's not in the pci database. Some ODMs > > > like getting creative with what they put in the NVM. > > > > > > Todd Fujinaka (todd.fujinaka@intel.com) > > > > Thanks for the prompt reply. Dave, could you please provide the requested > > information? > > sent off-list. > > Dave Whoops. I replied from outlook again. I have confirmation that this should be a valid image. The VPD is just a series of 3's. There are changes to preboot header, flash and BAR size, and as far as I can tell, a nonsense subdevice ID, but this should work. What was the original question? Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-14 20:00 ` Hisashi T Fujinaka 0 siblings, 0 replies; 53+ messages in thread From: Hisashi T Fujinaka @ 2021-09-14 20:00 UTC (permalink / raw) To: intel-wired-lan On Tue, 14 Sep 2021, Dave Jones wrote: > On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: > > > > Sorry to reply from my personal account. If I did it from my work > > > account I'd be top-posting because of Outlook and that goes over like a > > > lead balloon. > > > > > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can > > > either send it via a bug on e1000.sourceforge.net or try sending it to > > > todd.fujinaka at intel.com > > > > > > The other thing is I'm wondering is what the subvendor device ID you > > > have is referring to because it's not in the pci database. Some ODMs > > > like getting creative with what they put in the NVM. > > > > > > Todd Fujinaka (todd.fujinaka at intel.com) > > > > Thanks for the prompt reply. Dave, could you please provide the requested > > information? > > sent off-list. > > Dave Whoops. I replied from outlook again. I have confirmation that this should be a valid image. The VPD is just a series of 3's. There are changes to preboot header, flash and BAR size, and as far as I can tell, a nonsense subdevice ID, but this should work. What was the original question? Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-14 20:00 ` Hisashi T Fujinaka @ 2021-09-14 21:51 ` Heiner Kallweit -1 siblings, 0 replies; 53+ messages in thread From: Heiner Kallweit @ 2021-09-14 21:51 UTC (permalink / raw) To: Hisashi T Fujinaka, Dave Jones Cc: linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas On 14.09.2021 22:00, Hisashi T Fujinaka wrote: > On Tue, 14 Sep 2021, Dave Jones wrote: > >> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: >> >> > > Sorry to reply from my personal account. If I did it from my work >> > > account I'd be top-posting because of Outlook and that goes over like a >> > > lead balloon. >> > > >> > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can >> > > either send it via a bug on e1000.sourceforge.net or try sending it to >> > > todd.fujinaka@intel.com >> > > >> > > The other thing is I'm wondering is what the subvendor device ID you >> > > have is referring to because it's not in the pci database. Some ODMs >> > > like getting creative with what they put in the NVM. >> > > >> > > Todd Fujinaka (todd.fujinaka@intel.com) >> > >> > Thanks for the prompt reply. Dave, could you please provide the requested >> > information? >> >> sent off-list. >> >> Dave > > Whoops. I replied from outlook again. > > I have confirmation that this should be a valid image. The VPD is just a > series of 3's. There are changes to preboot header, flash and BAR size, > and as far as I can tell, a nonsense subdevice ID, but this should work. > > What was the original question? > "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD code resulted in a stall. So it seems the data doesn't have valid VPD format as defined in PCI specification. 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) Subsystem: Device 1dcf:030a ... Capabilities: [e0] Vital Product Data *Unknown small resource type 06, will not decode more.* Not sure which method is used by the driver to get the EEPROM content. For the issue here is relevant what is exposed via PCI VPD. The related kernel error message has been reported few times, e.g. here: https://access.redhat.com/solutions/3001451 Only due to a change in kernel code this became a more prominent issue now. You say that VPD is just a series of 3's. This may explain why kernel and tools complain about an invalid VPD format. VPD misses the tag structure. > Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-14 21:51 ` Heiner Kallweit 0 siblings, 0 replies; 53+ messages in thread From: Heiner Kallweit @ 2021-09-14 21:51 UTC (permalink / raw) To: intel-wired-lan On 14.09.2021 22:00, Hisashi T Fujinaka wrote: > On Tue, 14 Sep 2021, Dave Jones wrote: > >> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: >> >> > > Sorry to reply from my personal account. If I did it from my work >> > > account I'd be top-posting because of Outlook and that goes over like a >> > > lead balloon. >> > > >> > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can >> > > either send it via a bug on e1000.sourceforge.net or try sending it to >> > > todd.fujinaka at intel.com >> > > >> > > The other thing is I'm wondering is what the subvendor device ID you >> > > have is referring to because it's not in the pci database. Some ODMs >> > > like getting creative with what they put in the NVM. >> > > >> > > Todd Fujinaka (todd.fujinaka at intel.com) >> > >> > Thanks for the prompt reply. Dave, could you please provide the requested >> > information? >> >> sent off-list. >> >> ????Dave > > Whoops. I replied from outlook again. > > I have confirmation that this should be a valid image. The VPD is just a > series of 3's. There are changes to preboot header, flash and BAR size, > and as far as I can tell, a nonsense subdevice ID, but this should work. > > What was the original question? > "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD code resulted in a stall. So it seems the data doesn't have valid VPD format as defined in PCI specification. 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) Subsystem: Device 1dcf:030a ... Capabilities: [e0] Vital Product Data *Unknown small resource type 06, will not decode more.* Not sure which method is used by the driver to get the EEPROM content. For the issue here is relevant what is exposed via PCI VPD. The related kernel error message has been reported few times, e.g. here: https://access.redhat.com/solutions/3001451 Only due to a change in kernel code this became a more prominent issue now. You say that VPD is just a series of 3's. This may explain why kernel and tools complain about an invalid VPD format. VPD misses the tag structure. > Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-14 21:51 ` Heiner Kallweit @ 2021-09-15 14:18 ` Hisashi T Fujinaka -1 siblings, 0 replies; 53+ messages in thread From: Hisashi T Fujinaka @ 2021-09-15 14:18 UTC (permalink / raw) To: Heiner Kallweit Cc: Dave Jones, linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas On Tue, 14 Sep 2021, Heiner Kallweit wrote: > On 14.09.2021 22:00, Hisashi T Fujinaka wrote: >> On Tue, 14 Sep 2021, Dave Jones wrote: >> >>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: >>> >>>>> Sorry to reply from my personal account. If I did it from my work >>>>> account I'd be top-posting because of Outlook and that goes over like a >>>>> lead balloon. >>>>> >>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can >>>>> either send it via a bug on e1000.sourceforge.net or try sending it to >>>>> todd.fujinaka@intel.com >>>>> >>>>> The other thing is I'm wondering is what the subvendor device ID you >>>>> have is referring to because it's not in the pci database. Some ODMs >>>>> like getting creative with what they put in the NVM. >>>>> >>>>> Todd Fujinaka (todd.fujinaka@intel.com) >>>> >>>> Thanks for the prompt reply. Dave, could you please provide the requested >>>> information? >>> >>> sent off-list. >>> >>> Dave >> >> Whoops. I replied from outlook again. >> >> I have confirmation that this should be a valid image. The VPD is just a >> series of 3's. There are changes to preboot header, flash and BAR size, >> and as far as I can tell, a nonsense subdevice ID, but this should work. >> >> What was the original question? >> > "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD > code resulted in a stall. So it seems the data doesn't have valid VPD > format as defined in PCI specification. > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) > Subsystem: Device 1dcf:030a > ... > Capabilities: [e0] Vital Product Data > *Unknown small resource type 06, will not decode more.* > > Not sure which method is used by the driver to get the EEPROM content. > For the issue here is relevant what is exposed via PCI VPD. > > The related kernel error message has been reported few times, e.g. here: > https://access.redhat.com/solutions/3001451 > Only due to a change in kernel code this became a more prominent > issue now. > > You say that VPD is just a series of 3's. This may explain why kernel and > tools complain about an invalid VPD format. VPD misses the tag structure. I think I conflated two issues and yours may not be the one with the weird Amazon NIC. In any case, the VPD does not match the spec and two people have confirmed it's just full of 3's. With the bogus subvendor ID, I'm thinking this is not an Intel NIC. Next step is to contact whoever made the NIC and ask them for guidance. Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-15 14:18 ` Hisashi T Fujinaka 0 siblings, 0 replies; 53+ messages in thread From: Hisashi T Fujinaka @ 2021-09-15 14:18 UTC (permalink / raw) To: intel-wired-lan On Tue, 14 Sep 2021, Heiner Kallweit wrote: > On 14.09.2021 22:00, Hisashi T Fujinaka wrote: >> On Tue, 14 Sep 2021, Dave Jones wrote: >> >>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: >>> >>>>> Sorry to reply from my personal account. If I did it from my work >>>>> account I'd be top-posting because of Outlook and that goes over like a >>>>> lead balloon. >>>>> >>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can >>>>> either send it via a bug on e1000.sourceforge.net or try sending it to >>>>> todd.fujinaka at intel.com >>>>> >>>>> The other thing is I'm wondering is what the subvendor device ID you >>>>> have is referring to because it's not in the pci database. Some ODMs >>>>> like getting creative with what they put in the NVM. >>>>> >>>>> Todd Fujinaka (todd.fujinaka at intel.com) >>>> >>>> Thanks for the prompt reply. Dave, could you please provide the requested >>>> information? >>> >>> sent off-list. >>> >>> Dave >> >> Whoops. I replied from outlook again. >> >> I have confirmation that this should be a valid image. The VPD is just a >> series of 3's. There are changes to preboot header, flash and BAR size, >> and as far as I can tell, a nonsense subdevice ID, but this should work. >> >> What was the original question? >> > "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD > code resulted in a stall. So it seems the data doesn't have valid VPD > format as defined in PCI specification. > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) > Subsystem: Device 1dcf:030a > ... > Capabilities: [e0] Vital Product Data > *Unknown small resource type 06, will not decode more.* > > Not sure which method is used by the driver to get the EEPROM content. > For the issue here is relevant what is exposed via PCI VPD. > > The related kernel error message has been reported few times, e.g. here: > https://access.redhat.com/solutions/3001451 > Only due to a change in kernel code this became a more prominent > issue now. > > You say that VPD is just a series of 3's. This may explain why kernel and > tools complain about an invalid VPD format. VPD misses the tag structure. I think I conflated two issues and yours may not be the one with the weird Amazon NIC. In any case, the VPD does not match the spec and two people have confirmed it's just full of 3's. With the bogus subvendor ID, I'm thinking this is not an Intel NIC. Next step is to contact whoever made the NIC and ask them for guidance. Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-15 14:18 ` Hisashi T Fujinaka @ 2021-09-15 16:05 ` Heiner Kallweit -1 siblings, 0 replies; 53+ messages in thread From: Heiner Kallweit @ 2021-09-15 16:05 UTC (permalink / raw) To: Hisashi T Fujinaka Cc: Dave Jones, linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas On 15.09.2021 16:18, Hisashi T Fujinaka wrote: > On Tue, 14 Sep 2021, Heiner Kallweit wrote: > >> On 14.09.2021 22:00, Hisashi T Fujinaka wrote: >>> On Tue, 14 Sep 2021, Dave Jones wrote: >>> >>>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: >>>> >>>>>> Sorry to reply from my personal account. If I did it from my work >>>>>> account I'd be top-posting because of Outlook and that goes over like a >>>>>> lead balloon. >>>>>> >>>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can >>>>>> either send it via a bug on e1000.sourceforge.net or try sending it to >>>>>> todd.fujinaka@intel.com >>>>>> >>>>>> The other thing is I'm wondering is what the subvendor device ID you >>>>>> have is referring to because it's not in the pci database. Some ODMs >>>>>> like getting creative with what they put in the NVM. >>>>>> >>>>>> Todd Fujinaka (todd.fujinaka@intel.com) >>>>> >>>>> Thanks for the prompt reply. Dave, could you please provide the requested >>>>> information? >>>> >>>> sent off-list. >>>> >>>> Dave >>> >>> Whoops. I replied from outlook again. >>> >>> I have confirmation that this should be a valid image. The VPD is just a >>> series of 3's. There are changes to preboot header, flash and BAR size, >>> and as far as I can tell, a nonsense subdevice ID, but this should work. >>> >>> What was the original question? >>> >> "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD >> code resulted in a stall. So it seems the data doesn't have valid VPD >> format as defined in PCI specification. >> >> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) >> Subsystem: Device 1dcf:030a >> ... >> Capabilities: [e0] Vital Product Data >> *Unknown small resource type 06, will not decode more.* >> >> Not sure which method is used by the driver to get the EEPROM content. >> For the issue here is relevant what is exposed via PCI VPD. >> >> The related kernel error message has been reported few times, e.g. here: >> https://access.redhat.com/solutions/3001451 >> Only due to a change in kernel code this became a more prominent >> issue now. >> >> You say that VPD is just a series of 3's. This may explain why kernel and >> tools complain about an invalid VPD format. VPD misses the tag structure. > > I think I conflated two issues and yours may not be the one with the > weird Amazon NIC. In any case, the VPD does not match the spec and two > people have confirmed it's just full of 3's. With the bogus subvendor > ID, I'm thinking this is not an Intel NIC. > > Next step is to contact whoever made the NIC and ask them for guidance. > In an earlier mail in this thread was stated that subvendor id is unknown. Checking here https://pcisig.com/membership/member-companies?combine=1dcf it says: Beijing Sinead Technology Co., Ltd. > Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-15 16:05 ` Heiner Kallweit 0 siblings, 0 replies; 53+ messages in thread From: Heiner Kallweit @ 2021-09-15 16:05 UTC (permalink / raw) To: intel-wired-lan On 15.09.2021 16:18, Hisashi T Fujinaka wrote: > On Tue, 14 Sep 2021, Heiner Kallweit wrote: > >> On 14.09.2021 22:00, Hisashi T Fujinaka wrote: >>> On Tue, 14 Sep 2021, Dave Jones wrote: >>> >>>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: >>>> >>>>>> Sorry to reply from my personal account. If I did it from my work >>>>>> account I'd be top-posting because of Outlook and that goes over like a >>>>>> lead balloon. >>>>>> >>>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can >>>>>> either send it via a bug on e1000.sourceforge.net or try sending it to >>>>>> todd.fujinaka at intel.com >>>>>> >>>>>> The other thing is I'm wondering is what the subvendor device ID you >>>>>> have is referring to because it's not in the pci database. Some ODMs >>>>>> like getting creative with what they put in the NVM. >>>>>> >>>>>> Todd Fujinaka (todd.fujinaka at intel.com) >>>>> >>>>> Thanks for the prompt reply. Dave, could you please provide the requested >>>>> information? >>>> >>>> sent off-list. >>>> >>>> ??? Dave >>> >>> Whoops. I replied from outlook again. >>> >>> I have confirmation that this should be a valid image. The VPD is just a >>> series of 3's. There are changes to preboot header, flash and BAR size, >>> and as far as I can tell, a nonsense subdevice ID, but this should work. >>> >>> What was the original question? >>> >> "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD >> code resulted in a stall. So it seems the data doesn't have valid VPD >> format as defined in PCI specification. >> >> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) >> ?????? Subsystem: Device 1dcf:030a >> ????... >> ??????????? Capabilities: [e0] Vital Product Data >> ?????????????? *Unknown small resource type 06, will not decode more.* >> >> Not sure which method is used by the driver to get the EEPROM content. >> For the issue here is relevant what is exposed via PCI VPD. >> >> The related kernel error message has been reported few times, e.g. here: >> https://access.redhat.com/solutions/3001451 >> Only due to a change in kernel code this became a more prominent >> issue now. >> >> You say that VPD is just a series of 3's. This may explain why kernel and >> tools complain about an invalid VPD format. VPD misses the tag structure. > > I think I conflated two issues and yours may not be the one with the > weird Amazon NIC. In any case, the VPD does not match the spec and two > people have confirmed it's just full of 3's. With the bogus subvendor > ID, I'm thinking this is not an Intel NIC. > > Next step is to contact whoever made the NIC and ask them for guidance. > In an earlier mail in this thread was stated that subvendor id is unknown. Checking here https://pcisig.com/membership/member-companies?combine=1dcf it says: Beijing Sinead Technology Co., Ltd. > Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-15 16:05 ` Heiner Kallweit @ 2021-09-15 16:16 ` Hisashi T Fujinaka -1 siblings, 0 replies; 53+ messages in thread From: Hisashi T Fujinaka @ 2021-09-15 16:16 UTC (permalink / raw) To: Heiner Kallweit Cc: Dave Jones, linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas On Wed, 15 Sep 2021, Heiner Kallweit wrote: > On 15.09.2021 16:18, Hisashi T Fujinaka wrote: >> On Tue, 14 Sep 2021, Heiner Kallweit wrote: >> >>> On 14.09.2021 22:00, Hisashi T Fujinaka wrote: >>>> On Tue, 14 Sep 2021, Dave Jones wrote: >>>> >>>>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: >>>>> >>>>>>> Sorry to reply from my personal account. If I did it from my work >>>>>>> account I'd be top-posting because of Outlook and that goes over like a >>>>>>> lead balloon. >>>>>>> >>>>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can >>>>>>> either send it via a bug on e1000.sourceforge.net or try sending it to >>>>>>> todd.fujinaka@intel.com >>>>>>> >>>>>>> The other thing is I'm wondering is what the subvendor device ID you >>>>>>> have is referring to because it's not in the pci database. Some ODMs >>>>>>> like getting creative with what they put in the NVM. >>>>>>> >>>>>>> Todd Fujinaka (todd.fujinaka@intel.com) >>>>>> >>>>>> Thanks for the prompt reply. Dave, could you please provide the requested >>>>>> information? >>>>> >>>>> sent off-list. >>>>> >>>>> Dave >>>> >>>> Whoops. I replied from outlook again. >>>> >>>> I have confirmation that this should be a valid image. The VPD is just a >>>> series of 3's. There are changes to preboot header, flash and BAR size, >>>> and as far as I can tell, a nonsense subdevice ID, but this should work. >>>> >>>> What was the original question? >>>> >>> "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD >>> code resulted in a stall. So it seems the data doesn't have valid VPD >>> format as defined in PCI specification. >>> >>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) >>> Subsystem: Device 1dcf:030a >>> ... >>> Capabilities: [e0] Vital Product Data >>> *Unknown small resource type 06, will not decode more.* >>> >>> Not sure which method is used by the driver to get the EEPROM content. >>> For the issue here is relevant what is exposed via PCI VPD. >>> >>> The related kernel error message has been reported few times, e.g. here: >>> https://access.redhat.com/solutions/3001451 >>> Only due to a change in kernel code this became a more prominent >>> issue now. >>> >>> You say that VPD is just a series of 3's. This may explain why kernel and >>> tools complain about an invalid VPD format. VPD misses the tag structure. >> >> I think I conflated two issues and yours may not be the one with the >> weird Amazon NIC. In any case, the VPD does not match the spec and two >> people have confirmed it's just full of 3's. With the bogus subvendor >> ID, I'm thinking this is not an Intel NIC. >> >> Next step is to contact whoever made the NIC and ask them for guidance. >> > In an earlier mail in this thread was stated that subvendor id is unknown. > Checking here https://pcisig.com/membership/member-companies?combine=1dcf > it says: Beijing Sinead Technology Co., Ltd. Huh. I didn't realize there was an official list beyond pciids.ucw.cz. In any case, that's who you need to talk to about the unlisted (to Linux) vendor ID and also the odd VPD data. -- Hisashi T Fujinaka - htodd@twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-15 16:16 ` Hisashi T Fujinaka 0 siblings, 0 replies; 53+ messages in thread From: Hisashi T Fujinaka @ 2021-09-15 16:16 UTC (permalink / raw) To: intel-wired-lan On Wed, 15 Sep 2021, Heiner Kallweit wrote: > On 15.09.2021 16:18, Hisashi T Fujinaka wrote: >> On Tue, 14 Sep 2021, Heiner Kallweit wrote: >> >>> On 14.09.2021 22:00, Hisashi T Fujinaka wrote: >>>> On Tue, 14 Sep 2021, Dave Jones wrote: >>>> >>>>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote: >>>>> >>>>>>> Sorry to reply from my personal account. If I did it from my work >>>>>>> account I'd be top-posting because of Outlook and that goes over like a >>>>>>> lead balloon. >>>>>>> >>>>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can >>>>>>> either send it via a bug on e1000.sourceforge.net or try sending it to >>>>>>> todd.fujinaka at intel.com >>>>>>> >>>>>>> The other thing is I'm wondering is what the subvendor device ID you >>>>>>> have is referring to because it's not in the pci database. Some ODMs >>>>>>> like getting creative with what they put in the NVM. >>>>>>> >>>>>>> Todd Fujinaka (todd.fujinaka at intel.com) >>>>>> >>>>>> Thanks for the prompt reply. Dave, could you please provide the requested >>>>>> information? >>>>> >>>>> sent off-list. >>>>> >>>>> Dave >>>> >>>> Whoops. I replied from outlook again. >>>> >>>> I have confirmation that this should be a valid image. The VPD is just a >>>> series of 3's. There are changes to preboot header, flash and BAR size, >>>> and as far as I can tell, a nonsense subdevice ID, but this should work. >>>> >>>> What was the original question? >>>> >>> "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD >>> code resulted in a stall. So it seems the data doesn't have valid VPD >>> format as defined in PCI specification. >>> >>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) >>> Subsystem: Device 1dcf:030a >>> ... >>> Capabilities: [e0] Vital Product Data >>> *Unknown small resource type 06, will not decode more.* >>> >>> Not sure which method is used by the driver to get the EEPROM content. >>> For the issue here is relevant what is exposed via PCI VPD. >>> >>> The related kernel error message has been reported few times, e.g. here: >>> https://access.redhat.com/solutions/3001451 >>> Only due to a change in kernel code this became a more prominent >>> issue now. >>> >>> You say that VPD is just a series of 3's. This may explain why kernel and >>> tools complain about an invalid VPD format. VPD misses the tag structure. >> >> I think I conflated two issues and yours may not be the one with the >> weird Amazon NIC. In any case, the VPD does not match the spec and two >> people have confirmed it's just full of 3's. With the bogus subvendor >> ID, I'm thinking this is not an Intel NIC. >> >> Next step is to contact whoever made the NIC and ask them for guidance. >> > In an earlier mail in this thread was stated that subvendor id is unknown. > Checking here https://pcisig.com/membership/member-companies?combine=1dcf > it says: Beijing Sinead Technology Co., Ltd. Huh. I didn't realize there was an official list beyond pciids.ucw.cz. In any case, that's who you need to talk to about the unlisted (to Linux) vendor ID and also the odd VPD data. -- Hisashi T Fujinaka - htodd at twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-15 16:16 ` Hisashi T Fujinaka @ 2021-09-15 22:32 ` Bjorn Helgaas -1 siblings, 0 replies; 53+ messages in thread From: Bjorn Helgaas @ 2021-09-15 22:32 UTC (permalink / raw) To: Hisashi T Fujinaka Cc: Heiner Kallweit, Dave Jones, linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote: > On Wed, 15 Sep 2021, Heiner Kallweit wrote: > > On 15.09.2021 16:18, Hisashi T Fujinaka wrote: > > > On Tue, 14 Sep 2021, Heiner Kallweit wrote: > > > > On 14.09.2021 22:00, Hisashi T Fujinaka wrote: > > > > > I have confirmation that this should be a valid image. The > > > > > VPD is just a series of 3's. There are changes to preboot > > > > > header, flash and BAR size, and as far as I can tell, a > > > > > nonsense subdevice ID, but this should work. > > > > > > > > > > What was the original question? > > > > > > > > > "lspci -vv" complains about an invalid short tag 0x06 and the > > > > PCI VPD code resulted in a stall. So it seems the data doesn't > > > > have valid VPD format as defined in PCI specification. > > > > > > > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) > > > > Subsystem: Device 1dcf:030a > > > > ... > > > > Capabilities: [e0] Vital Product Data > > > > *Unknown small resource type 06, will not decode more.* > > > > > > > > Not sure which method is used by the driver to get the EEPROM > > > > content. For the issue here is relevant what is exposed via > > > > PCI VPD. > > > > > > > > The related kernel error message has been reported few times, > > > > e.g. here: https://access.redhat.com/solutions/3001451 Only > > > > due to a change in kernel code this became a more prominent > > > > issue now. > > > > > > > > You say that VPD is just a series of 3's. This may explain why > > > > kernel and tools complain about an invalid VPD format. VPD > > > > misses the tag structure. > > > > > > I think I conflated two issues and yours may not be the one with the > > > weird Amazon NIC. In any case, the VPD does not match the spec and two > > > people have confirmed it's just full of 3's. With the bogus subvendor > > > ID, I'm thinking this is not an Intel NIC. A series of 0x03 0x03 0x03 ... bytes would decode as "small items of type 00", so I assume the VPD contains a series of 0x33 0x33 0x33 ... bytes. That would decode to a series of "small items of type 06", each of length four (one byte for the tag, three bytes of data). Prior to v5.15, we would complain "invalid short VPD tag 06" and stop reading. As of v5.15, I think we'll just keep reading looking for a valid "end" tag, but we'll never find one. I think in v5.15 there will be no error message because the series of four-byte small data items happens to fit exactly in the maximum 32KB size of VPD and is technically legal syntactic structure, even though it makes no sense. But it will be much slower and might account for the boot slowdown Dave reported. > > In an earlier mail in this thread was stated that subvendor id is unknown. > > Checking here https://pcisig.com/membership/member-companies?combine=1dcf > > it says: Beijing Sinead Technology Co., Ltd. > > Huh. I didn't realize there was an official list beyond pciids.ucw.cz. > > In any case, that's who you need to talk to about the unlisted (to > Linux) vendor ID and also the odd VPD data. https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it is basically crowd-sourced data, not the "official" Vendor IDs controlled by the PCI SIG. I submitted an addition to https://pci-ids.ucw.cz/ Bjorn ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-15 22:32 ` Bjorn Helgaas 0 siblings, 0 replies; 53+ messages in thread From: Bjorn Helgaas @ 2021-09-15 22:32 UTC (permalink / raw) To: intel-wired-lan On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote: > On Wed, 15 Sep 2021, Heiner Kallweit wrote: > > On 15.09.2021 16:18, Hisashi T Fujinaka wrote: > > > On Tue, 14 Sep 2021, Heiner Kallweit wrote: > > > > On 14.09.2021 22:00, Hisashi T Fujinaka wrote: > > > > > I have confirmation that this should be a valid image. The > > > > > VPD is just a series of 3's. There are changes to preboot > > > > > header, flash and BAR size, and as far as I can tell, a > > > > > nonsense subdevice ID, but this should work. > > > > > > > > > > What was the original question? > > > > > > > > > "lspci -vv" complains about an invalid short tag 0x06 and the > > > > PCI VPD code resulted in a stall. So it seems the data doesn't > > > > have valid VPD format as defined in PCI specification. > > > > > > > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) > > > > Subsystem: Device 1dcf:030a > > > > ... > > > > Capabilities: [e0] Vital Product Data > > > > *Unknown small resource type 06, will not decode more.* > > > > > > > > Not sure which method is used by the driver to get the EEPROM > > > > content. For the issue here is relevant what is exposed via > > > > PCI VPD. > > > > > > > > The related kernel error message has been reported few times, > > > > e.g. here: https://access.redhat.com/solutions/3001451 Only > > > > due to a change in kernel code this became a more prominent > > > > issue now. > > > > > > > > You say that VPD is just a series of 3's. This may explain why > > > > kernel and tools complain about an invalid VPD format. VPD > > > > misses the tag structure. > > > > > > I think I conflated two issues and yours may not be the one with the > > > weird Amazon NIC. In any case, the VPD does not match the spec and two > > > people have confirmed it's just full of 3's. With the bogus subvendor > > > ID, I'm thinking this is not an Intel NIC. A series of 0x03 0x03 0x03 ... bytes would decode as "small items of type 00", so I assume the VPD contains a series of 0x33 0x33 0x33 ... bytes. That would decode to a series of "small items of type 06", each of length four (one byte for the tag, three bytes of data). Prior to v5.15, we would complain "invalid short VPD tag 06" and stop reading. As of v5.15, I think we'll just keep reading looking for a valid "end" tag, but we'll never find one. I think in v5.15 there will be no error message because the series of four-byte small data items happens to fit exactly in the maximum 32KB size of VPD and is technically legal syntactic structure, even though it makes no sense. But it will be much slower and might account for the boot slowdown Dave reported. > > In an earlier mail in this thread was stated that subvendor id is unknown. > > Checking here https://pcisig.com/membership/member-companies?combine=1dcf > > it says: Beijing Sinead Technology Co., Ltd. > > Huh. I didn't realize there was an official list beyond pciids.ucw.cz. > > In any case, that's who you need to talk to about the unlisted (to > Linux) vendor ID and also the odd VPD data. https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it is basically crowd-sourced data, not the "official" Vendor IDs controlled by the PCI SIG. I submitted an addition to https://pci-ids.ucw.cz/ Bjorn ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-15 22:32 ` Bjorn Helgaas @ 2021-09-15 23:46 ` Hisashi T Fujinaka -1 siblings, 0 replies; 53+ messages in thread From: Hisashi T Fujinaka @ 2021-09-15 23:46 UTC (permalink / raw) To: Bjorn Helgaas Cc: Dave Jones, linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas, Heiner Kallweit On Wed, 15 Sep 2021, Bjorn Helgaas wrote: > On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote: >> On Wed, 15 Sep 2021, Heiner Kallweit wrote: >>> On 15.09.2021 16:18, Hisashi T Fujinaka wrote: >>>> On Tue, 14 Sep 2021, Heiner Kallweit wrote: >>>>> On 14.09.2021 22:00, Hisashi T Fujinaka wrote: > >>>>>> I have confirmation that this should be a valid image. The >>>>>> VPD is just a series of 3's. There are changes to preboot >>>>>> header, flash and BAR size, and as far as I can tell, a >>>>>> nonsense subdevice ID, but this should work. >>>>>> >>>>>> What was the original question? >>>>>> >>>>> "lspci -vv" complains about an invalid short tag 0x06 and the >>>>> PCI VPD code resulted in a stall. So it seems the data doesn't >>>>> have valid VPD format as defined in PCI specification. >>>>> >>>>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) >>>>> Subsystem: Device 1dcf:030a >>>>> ... >>>>> Capabilities: [e0] Vital Product Data >>>>> *Unknown small resource type 06, will not decode more.* >>>>> >>>>> Not sure which method is used by the driver to get the EEPROM >>>>> content. For the issue here is relevant what is exposed via >>>>> PCI VPD. >>>>> >>>>> The related kernel error message has been reported few times, >>>>> e.g. here: https://access.redhat.com/solutions/3001451 Only >>>>> due to a change in kernel code this became a more prominent >>>>> issue now. >>>>> >>>>> You say that VPD is just a series of 3's. This may explain why >>>>> kernel and tools complain about an invalid VPD format. VPD >>>>> misses the tag structure. >>>> >>>> I think I conflated two issues and yours may not be the one with the >>>> weird Amazon NIC. In any case, the VPD does not match the spec and two >>>> people have confirmed it's just full of 3's. With the bogus subvendor >>>> ID, I'm thinking this is not an Intel NIC. > > A series of 0x03 0x03 0x03 ... bytes would decode as "small items of > type 00", so I assume the VPD contains a series of 0x33 0x33 0x33 ... > bytes. That would decode to a series of "small items of type 06", > each of length four (one byte for the tag, three bytes of data). > > Prior to v5.15, we would complain "invalid short VPD tag 06" and stop > reading. As of v5.15, I think we'll just keep reading looking for a > valid "end" tag, but we'll never find one. > > I think in v5.15 there will be no error message because the series of > four-byte small data items happens to fit exactly in the maximum 32KB > size of VPD and is technically legal syntactic structure, even though > it makes no sense. > > But it will be much slower and might account for the boot slowdown > Dave reported. > >>> In an earlier mail in this thread was stated that subvendor id is unknown. >>> Checking here https://pcisig.com/membership/member-companies?combine=1dcf >>> it says: Beijing Sinead Technology Co., Ltd. >> >> Huh. I didn't realize there was an official list beyond pciids.ucw.cz. >> >> In any case, that's who you need to talk to about the unlisted (to >> Linux) vendor ID and also the odd VPD data. > > https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it > is basically crowd-sourced data, not the "official" Vendor IDs > controlled by the PCI SIG. > > I submitted an addition to https://pci-ids.ucw.cz/ > > Bjorn Just for my edumacation, do they keep track of device IDs, subvendor IDs (which are probably just the same as vendor IDs), and subdevice IDs in the PCI SIG? Or even the branding strings? Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-15 23:46 ` Hisashi T Fujinaka 0 siblings, 0 replies; 53+ messages in thread From: Hisashi T Fujinaka @ 2021-09-15 23:46 UTC (permalink / raw) To: intel-wired-lan On Wed, 15 Sep 2021, Bjorn Helgaas wrote: > On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote: >> On Wed, 15 Sep 2021, Heiner Kallweit wrote: >>> On 15.09.2021 16:18, Hisashi T Fujinaka wrote: >>>> On Tue, 14 Sep 2021, Heiner Kallweit wrote: >>>>> On 14.09.2021 22:00, Hisashi T Fujinaka wrote: > >>>>>> I have confirmation that this should be a valid image. The >>>>>> VPD is just a series of 3's. There are changes to preboot >>>>>> header, flash and BAR size, and as far as I can tell, a >>>>>> nonsense subdevice ID, but this should work. >>>>>> >>>>>> What was the original question? >>>>>> >>>>> "lspci -vv" complains about an invalid short tag 0x06 and the >>>>> PCI VPD code resulted in a stall. So it seems the data doesn't >>>>> have valid VPD format as defined in PCI specification. >>>>> >>>>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) >>>>> Subsystem: Device 1dcf:030a >>>>> ... >>>>> Capabilities: [e0] Vital Product Data >>>>> *Unknown small resource type 06, will not decode more.* >>>>> >>>>> Not sure which method is used by the driver to get the EEPROM >>>>> content. For the issue here is relevant what is exposed via >>>>> PCI VPD. >>>>> >>>>> The related kernel error message has been reported few times, >>>>> e.g. here: https://access.redhat.com/solutions/3001451 Only >>>>> due to a change in kernel code this became a more prominent >>>>> issue now. >>>>> >>>>> You say that VPD is just a series of 3's. This may explain why >>>>> kernel and tools complain about an invalid VPD format. VPD >>>>> misses the tag structure. >>>> >>>> I think I conflated two issues and yours may not be the one with the >>>> weird Amazon NIC. In any case, the VPD does not match the spec and two >>>> people have confirmed it's just full of 3's. With the bogus subvendor >>>> ID, I'm thinking this is not an Intel NIC. > > A series of 0x03 0x03 0x03 ... bytes would decode as "small items of > type 00", so I assume the VPD contains a series of 0x33 0x33 0x33 ... > bytes. That would decode to a series of "small items of type 06", > each of length four (one byte for the tag, three bytes of data). > > Prior to v5.15, we would complain "invalid short VPD tag 06" and stop > reading. As of v5.15, I think we'll just keep reading looking for a > valid "end" tag, but we'll never find one. > > I think in v5.15 there will be no error message because the series of > four-byte small data items happens to fit exactly in the maximum 32KB > size of VPD and is technically legal syntactic structure, even though > it makes no sense. > > But it will be much slower and might account for the boot slowdown > Dave reported. > >>> In an earlier mail in this thread was stated that subvendor id is unknown. >>> Checking here https://pcisig.com/membership/member-companies?combine=1dcf >>> it says: Beijing Sinead Technology Co., Ltd. >> >> Huh. I didn't realize there was an official list beyond pciids.ucw.cz. >> >> In any case, that's who you need to talk to about the unlisted (to >> Linux) vendor ID and also the odd VPD data. > > https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it > is basically crowd-sourced data, not the "official" Vendor IDs > controlled by the PCI SIG. > > I submitted an addition to https://pci-ids.ucw.cz/ > > Bjorn Just for my edumacation, do they keep track of device IDs, subvendor IDs (which are probably just the same as vendor IDs), and subdevice IDs in the PCI SIG? Or even the branding strings? Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue 2021-09-15 23:46 ` Hisashi T Fujinaka @ 2021-09-17 15:09 ` Bjorn Helgaas -1 siblings, 0 replies; 53+ messages in thread From: Bjorn Helgaas @ 2021-09-17 15:09 UTC (permalink / raw) To: Hisashi T Fujinaka Cc: Dave Jones, linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas, Heiner Kallweit On Wed, Sep 15, 2021 at 04:46:54PM -0700, Hisashi T Fujinaka wrote: > On Wed, 15 Sep 2021, Bjorn Helgaas wrote: > > > On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote: > > > On Wed, 15 Sep 2021, Heiner Kallweit wrote: > > > > In an earlier mail in this thread was stated that subvendor id is unknown. > > > > Checking here https://pcisig.com/membership/member-companies?combine=1dcf > > > > it says: Beijing Sinead Technology Co., Ltd. > > > > > > Huh. I didn't realize there was an official list beyond pciids.ucw.cz. > > > > > > In any case, that's who you need to talk to about the unlisted (to > > > Linux) vendor ID and also the odd VPD data. > > > > https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it > > is basically crowd-sourced data, not the "official" Vendor IDs > > controlled by the PCI SIG. > > > > I submitted an addition to https://pci-ids.ucw.cz/ > > > > Bjorn > > Just for my edumacation, do they keep track of device IDs, subvendor IDs > (which are probably just the same as vendor IDs), and subdevice IDs in > the PCI SIG? Or even the branding strings? The PCI SIG does not manage Device IDs. That's the responsibility of the vendor. ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue @ 2021-09-17 15:09 ` Bjorn Helgaas 0 siblings, 0 replies; 53+ messages in thread From: Bjorn Helgaas @ 2021-09-17 15:09 UTC (permalink / raw) To: intel-wired-lan On Wed, Sep 15, 2021 at 04:46:54PM -0700, Hisashi T Fujinaka wrote: > On Wed, 15 Sep 2021, Bjorn Helgaas wrote: > > > On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote: > > > On Wed, 15 Sep 2021, Heiner Kallweit wrote: > > > > In an earlier mail in this thread was stated that subvendor id is unknown. > > > > Checking here https://pcisig.com/membership/member-companies?combine=1dcf > > > > it says: Beijing Sinead Technology Co., Ltd. > > > > > > Huh. I didn't realize there was an official list beyond pciids.ucw.cz. > > > > > > In any case, that's who you need to talk to about the unlisted (to > > > Linux) vendor ID and also the odd VPD data. > > > > https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it > > is basically crowd-sourced data, not the "official" Vendor IDs > > controlled by the PCI SIG. > > > > I submitted an addition to https://pci-ids.ucw.cz/ > > > > Bjorn > > Just for my edumacation, do they keep track of device IDs, subvendor IDs > (which are probably just the same as vendor IDs), and subdevice IDs in > the PCI SIG? Or even the branding strings? The PCI SIG does not manage Device IDs. That's the responsibility of the vendor. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 20:32 ` Dave Jones 2021-09-13 20:44 ` [Intel-wired-lan] " Heiner Kallweit @ 2021-09-13 20:59 ` Heiner Kallweit 2021-09-13 23:35 ` Bjorn Helgaas 1 sibling, 1 reply; 53+ messages in thread From: Heiner Kallweit @ 2021-09-13 20:59 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci On 13.09.2021 22:32, Dave Jones wrote: > On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote: > > > > This didn't help I'm afraid :( > > > It changed the VPD warning, but that's about it... > > > > > > [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 > > > [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) > > > [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs > > > > > With this patch there's no VPD access to this device any longer. So this can't be > > the root cause. Do you have any other PCI device that has VPD capability? > > -> Capabilities: [...] Vital Product Data > > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) > Subsystem: Device 1dcf:030a > ... > Capabilities: [e0] Vital Product Data > Unknown small resource type 06, will not decode more. > The stall being discussed would have been prevented by the VPD tag verification in pci_vpd_size(). It seems that now random data is interpreted as VPD tags what results in VPD access to an address that makes the device stall. I do not really follow Linus' argumentation that VPD shouldn't be accessed during boot because other slow "VPD-like" devices are accessed too, e.g. DDR SPD via I2C. > > I'll add that to the quirk list and see if that helps. > > Dave > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 20:59 ` Linux 5.15-rc1 Heiner Kallweit @ 2021-09-13 23:35 ` Bjorn Helgaas 0 siblings, 0 replies; 53+ messages in thread From: Bjorn Helgaas @ 2021-09-13 23:35 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Bjorn Helgaas, linux-pci On Mon, Sep 13, 2021 at 10:59:05PM +0200, Heiner Kallweit wrote: > On 13.09.2021 22:32, Dave Jones wrote: > > On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote: > > > > > > This didn't help I'm afraid :( > > > > It changed the VPD warning, but that's about it... > > > > > > > > [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 > > > > [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) > > > > [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs > > > > > > > With this patch there's no VPD access to this device any longer. So this can't be > > > the root cause. Do you have any other PCI device that has VPD capability? > > > -> Capabilities: [...] Vital Product Data > > > > > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) > > Subsystem: Device 1dcf:030a > > ... > > Capabilities: [e0] Vital Product Data > > Unknown small resource type 06, will not decode more. > > > > The stall being discussed would have been prevented by the VPD tag > verification in pci_vpd_size(). It seems that now random data is > interpreted as VPD tags what results in VPD access to an address > that makes the device stall. It's possible we need to validate more. But that's not the solution to the current problem. > I do not really follow Linus' argumentation that VPD shouldn't be > accessed during boot because other slow "VPD-like" devices are > accessed too, e.g. DDR SPD via I2C. I do. If we don't *need* it during boot, there's no reason to read it then. It only slows down boot. Bjorn ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 20:15 ` Dave Jones 2021-09-13 20:22 ` Heiner Kallweit @ 2021-09-13 20:32 ` Heiner Kallweit 2021-09-13 20:36 ` Linus Torvalds 2021-09-13 20:41 ` Dave Jones 1 sibling, 2 replies; 53+ messages in thread From: Heiner Kallweit @ 2021-09-13 20:32 UTC (permalink / raw) To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci On 13.09.2021 22:15, Dave Jones wrote: > On Mon, Sep 13, 2021 at 08:59:49PM +0200, Heiner Kallweit wrote: > > On 13.09.2021 16:18, Dave Jones wrote: > > > [ 186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601 > > > [ 186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff] > > > [ 186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref] > > > [ 186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold > > > [ 186.735107] pci 0000:02:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update > > > > Thanks for the report! The stalls may be related to this one. Device is: > > 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) > > > > With an older kernel you may experience the stall when accessing the vpd > > attribute of this device in sysfs. > > > > Maybe the device indicates VPD capability but doesn't actually support it. > > Could you please provide the "lspci -vv" output for this device? > > 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) (prog-if 01 [AHCI 1.0]) > Subsystem: Samsung Electronics Co Ltd XP941 PCIe SSD > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 16 > Region 5: Memory at dfc10000 (32-bit, non-prefetchable) [size=8K] > Expansion ROM at dfc00000 [disabled] [size=64K] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [70] Express (v2) Endpoint, MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- > LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us > ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 5GT/s (ok), Width x2 (downgraded) > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+ > 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- > EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- > FRS- TPHComp- ExtTPHComp- > AtomicOpsCap: 32bit- 64bit- 128bitCAS- > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, > AtomicOpsCtl: ReqEn- > LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS- > LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- > EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- > Retimer- 2Retimers- CrosslinkRes: unsupported > Capabilities: [d0] Vital Product Data > Not readable > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ > AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- > MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- > HeaderLog: 00000000 00000000 00000000 00000000 > Capabilities: [140 v1] Device Serial Number 00-00-00-00-00-00-00-00 > Capabilities: [150 v1] Power Budgeting <?> > Capabilities: [160 v1] Latency Tolerance Reporting > Max snoop latency: 71680ns > Max no snoop latency: 71680ns > Kernel driver in use: ahci > > > > And could you please test with the following applied to verify the > > assumption? It disables VPD access for this device. > > > > --- > > drivers/pci/vpd.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c > > index 517789205..fc92e880e 100644 > > --- a/drivers/pci/vpd.c > > +++ b/drivers/pci/vpd.c > > @@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd); > > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd); > > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd); > > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd); > > /* > > * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port > > * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class. > > > This didn't help I'm afraid :( > It changed the VPD warning, but that's about it... > > [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 > [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) > [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs > OK, so this device is buggy too but not the root cause. After checking again the stalls happen for VPD access to both ports of the Intel network adapter. 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) I modified the test patch accordingly, could you please test again? --- drivers/pci/vpd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c index 517789205..fc92e880e 100644 --- a/drivers/pci/vpd.c +++ b/drivers/pci/vpd.c @@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x10fb, quirk_blacklist_vpd); /* * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class. -- 2.33.0 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 20:32 ` Heiner Kallweit @ 2021-09-13 20:36 ` Linus Torvalds 2021-09-13 20:41 ` Dave Jones 1 sibling, 0 replies; 53+ messages in thread From: Linus Torvalds @ 2021-09-13 20:36 UTC (permalink / raw) To: Heiner Kallweit Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci On Mon, Sep 13, 2021 at 1:33 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > OK, so this device is buggy too but not the root cause. After checking again the > stalls happen for VPD access to both ports of the Intel network adapter. I really don't want to add quirks like this. If this happens to a major manufacturer (whoe _defined_ the PCI standard, for chrissake), then this is not something where we want a quirk to avoid a boot-time delay. That commit needs to be reverted. If reverting it is hard, then we need to revert everything it depends on. Linus ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 20:32 ` Heiner Kallweit 2021-09-13 20:36 ` Linus Torvalds @ 2021-09-13 20:41 ` Dave Jones 1 sibling, 0 replies; 53+ messages in thread From: Dave Jones @ 2021-09-13 20:41 UTC (permalink / raw) To: Heiner Kallweit Cc: Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci On Mon, Sep 13, 2021 at 10:32:56PM +0200, Heiner Kallweit wrote: > > [ 184.235496] pci 0000:02:00.0: calling quirk_blacklist_vpd+0x0/0x22 @ 1 > > [ 184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) > > [ 184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs > > > > OK, so this device is buggy too but not the root cause. After checking again the > stalls happen for VPD access to both ports of the Intel network adapter. > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) > 01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) > > I modified the test patch accordingly, could you please test again? That does avoid the stall. Dave ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 14:18 ` Dave Jones 2021-09-13 18:59 ` Heiner Kallweit @ 2021-09-13 23:46 ` Bjorn Helgaas 2021-09-14 0:39 ` Dave Jones 2021-09-14 6:21 ` Heiner Kallweit 1 sibling, 2 replies; 53+ messages in thread From: Bjorn Helgaas @ 2021-09-13 23:46 UTC (permalink / raw) To: Dave Jones, Linus Torvalds, Heiner Kallweit; +Cc: linux-kernel, linux-pci On Mon, Sep 13, 2021 at 10:18:18AM -0400, Dave Jones wrote: > On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote: > > So 5.15 isn't shaping up to be a particularly large release, at least > > in number of commits. At only just over 10k non-merge commits, this is > > in fact the smallest rc1 we have had in the 5.x series. We're usually > > hovering in the 12-14k commit range. > > This release takes over two minutes longer to boot on one my > machines than 5.14. The time just seems to be unaccounted for, even > with initcall_debug > ... > [ 2.194093] pci 0000:01:00.0: calling quirk_f0_vpd_link+0x0/0x60 @ 1 > [ 2.194097] pci 0000:01:00.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs > [ 2.194100] pci 0000:01:00.0: [8086:10fb] type 00 class 0x020000 > [ 2.194109] pci 0000:01:00.0: reg 0x10: [mem 0xd0080000-0xd00fffff 64bit pref] > [ 2.194113] pci 0000:01:00.0: reg 0x18: [io 0xe020-0xe03f] > [ 2.194121] pci 0000:01:00.0: reg 0x20: [mem 0xd0104000-0xd0107fff 64bit pref] > [ 2.194126] pci 0000:01:00.0: reg 0x30: [mem 0xdfd80000-0xdfdfffff pref] > [ 2.194136] pci 0000:01:00.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 > [ 2.194139] pci 0000:01:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs > [ 2.194164] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold > > * stall here for 86 seconds * > > [ 88.675114] pci 0000:01:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref] > ... > 7bac54497c3e3b2ca37b7043f1fa78586540f10e is the first bad commit > commit 7bac54497c3e3b2ca37b7043f1fa78586540f10e > Author: Heiner Kallweit <hkallweit1@gmail.com> > Date: Sun Aug 8 19:22:52 2021 +0200 > > PCI/VPD: Determine VPD size in pci_vpd_init() > > Determine VPD size in pci_vpd_init(). > > Quirks set dev->vpd.len to a non-zero value, so they cause us to skip the > dynamic size calculation. Prerequisite is that we move the quirks from > FINAL to HEADER so they are run before pci_vpd_init(). > > Link: https://lore.kernel.org/r/cc4a6538-557a-294d-4f94-e6d1d3c91589@gmail.com > Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > > Which unfortunately doesn't revert cleanly I can't test it reverted in > isolation. > > My guess is there's something quirky about the PCI bus on this machine > that causes stalls until we hit timeout, but I'm not sure where to begin > debugging this. Sorry for the inconvenience of this, and thank you very much for doing the bisection to track it down. We *could* revert 7bac54497c3e, but it'd be messy because a bunch of follow-up stuff depends on it. I propose something like the patch below. Would you mind trying it out? commit 4ede9949b93c ("PCI/VPD: Defer VPD sizing until first access") Author: Bjorn Helgaas <bhelgaas@google.com> Date: Mon Sep 13 16:13:26 2021 -0500 PCI/VPD: Defer VPD sizing until first access 7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()") reads VPD at enumeration-time to find the size. But this is quite slow, and we don't need the size until we actually need data from VPD. Dave reported a boot slowdown of more than two minutes [1]. Defer the VPD sizing until a driver or the user requests information from VPD. If devices are quirked because VPD is known not to work, clear the vpd.cap pointer so we don't access it at all. [1] https://lore.kernel.org/r/20210913141818.GA27911@codemonkey.org.uk/ Fixes: 7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c index 25557b272a4f..ca823ceee10c 100644 --- a/drivers/pci/vpd.c +++ b/drivers/pci/vpd.c @@ -46,13 +46,12 @@ static struct pci_dev *pci_get_func0_dev(struct pci_dev *dev) } #define PCI_VPD_MAX_SIZE (PCI_VPD_ADDR_MASK + 1) -#define PCI_VPD_SZ_INVALID UINT_MAX /** * pci_vpd_size - determine actual size of Vital Product Data * @dev: pci device struct */ -static size_t pci_vpd_size(struct pci_dev *dev) +static void pci_vpd_size(struct pci_dev *dev) { size_t off = 0, size; unsigned char tag, header[1+2]; /* 1 byte tag, 2 bytes length */ @@ -71,7 +70,7 @@ static size_t pci_vpd_size(struct pci_dev *dev) if (pci_read_vpd(dev, off + 1, 2, &header[1]) != 2) { pci_warn(dev, "failed VPD read at offset %zu\n", off + 1); - return off ?: PCI_VPD_SZ_INVALID; + goto finish; } size = pci_vpd_lrdt_size(header); if (off + size > PCI_VPD_MAX_SIZE) @@ -87,16 +86,19 @@ static size_t pci_vpd_size(struct pci_dev *dev) off += PCI_VPD_SRDT_TAG_SIZE + size; if (tag == PCI_VPD_STIN_END) /* End tag descriptor */ - return off; + goto finish; } } - return off; + goto finish; error: pci_info(dev, "invalid VPD tag %#04x (size %zu) at offset %zu%s\n", header[0], size, off, off == 0 ? "; assume missing optional EEPROM" : ""); - return off ?: PCI_VPD_SZ_INVALID; +finish: + dev->vpd.len = off; + if (off == 0) + dev->vpd.cap = 0; /* No VPD at all */ } /* @@ -145,9 +147,6 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count, loff_t end = pos + count; u8 *buf = arg; - if (!vpd->cap) - return -ENODEV; - if (pos < 0) return -EINVAL; @@ -206,9 +205,6 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count, loff_t end = pos + count; int ret = 0; - if (!vpd->cap) - return -ENODEV; - if (pos < 0 || (pos & 3) || (count & 3)) return -EINVAL; @@ -244,12 +240,6 @@ void pci_vpd_init(struct pci_dev *dev) { dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD); mutex_init(&dev->vpd.lock); - - if (!dev->vpd.len) - dev->vpd.len = pci_vpd_size(dev); - - if (dev->vpd.len == PCI_VPD_SZ_INVALID) - dev->vpd.cap = 0; } static ssize_t vpd_read(struct file *filp, struct kobject *kobj, @@ -294,25 +284,29 @@ const struct attribute_group pci_dev_vpd_attr_group = { void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size) { - unsigned int len = dev->vpd.len; + struct pci_vpd *vpd = &dev->vpd; + unsigned int len; void *buf; int cnt; - if (!dev->vpd.cap) + if (!vpd->cap) return ERR_PTR(-ENODEV); buf = kmalloc(len, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM); - cnt = pci_read_vpd(dev, 0, len, buf); + if (vpd->len == 0) + pci_vpd_size(dev); + + cnt = pci_read_vpd(dev, 0, vpd->len, buf); if (cnt != len) { kfree(buf); return ERR_PTR(-EIO); } if (size) - *size = len; + *size = vpd->len; return buf; } @@ -374,6 +368,7 @@ static int pci_vpd_find_info_keyword(const u8 *buf, unsigned int off, */ ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf) { + struct pci_vpd *vpd; ssize_t ret; if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { @@ -381,11 +376,27 @@ ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf) if (!dev) return -ENODEV; + vpd = &dev->vpd; + if (!vpd->cap) { + pci_dev_put(dev); + return -ENODEV; + } + + if (vpd->len == 0) + pci_vpd_size(dev); + ret = pci_vpd_read(dev, pos, count, buf); pci_dev_put(dev); return ret; } + vpd = &dev->vpd; + if (!vpd->cap) + return -ENODEV; + + if (vpd->len == 0) + pci_vpd_size(dev); + return pci_vpd_read(dev, pos, count, buf); } EXPORT_SYMBOL(pci_read_vpd); @@ -399,6 +410,7 @@ EXPORT_SYMBOL(pci_read_vpd); */ ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf) { + struct pci_vpd *vpd; ssize_t ret; if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { @@ -406,11 +418,26 @@ ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void if (!dev) return -ENODEV; + vpd = &dev->vpd; + if (!vpd->cap) { + pci_dev_put(dev); + return -ENODEV; + } + + if (vpd->len == 0) + pci_vpd_size(dev); + ret = pci_vpd_write(dev, pos, count, buf); pci_dev_put(dev); return ret; } + if (!vpd->cap) + return -ENODEV; + + if (vpd->len == 0) + pci_vpd_size(dev); + return pci_vpd_write(dev, pos, count, buf); } EXPORT_SYMBOL(pci_write_vpd); @@ -500,27 +527,27 @@ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, */ static void quirk_blacklist_vpd(struct pci_dev *dev) { - dev->vpd.len = PCI_VPD_SZ_INVALID; + dev->vpd.cap = 0; pci_warn(dev, FW_BUG "disabling VPD access (can't determine size of non-standard VPD format)\n"); } -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd); -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd); -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd); -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd); -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd); -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd); -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd); -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd); -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd); -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd); -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd); -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); /* * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class. */ -DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031, - PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd); +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031, + PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd); static void quirk_chelsio_extend_vpd(struct pci_dev *dev) { @@ -545,7 +572,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev) dev->vpd.len = 2048; } -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID, - quirk_chelsio_extend_vpd); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID, + quirk_chelsio_extend_vpd); #endif ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 23:46 ` Bjorn Helgaas @ 2021-09-14 0:39 ` Dave Jones 2021-09-14 6:21 ` Heiner Kallweit 1 sibling, 0 replies; 53+ messages in thread From: Dave Jones @ 2021-09-14 0:39 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: Linus Torvalds, Heiner Kallweit, linux-kernel, linux-pci On Mon, Sep 13, 2021 at 06:46:08PM -0500, Bjorn Helgaas wrote: > On Mon, Sep 13, 2021 at 10:18:18AM -0400, Dave Jones wrote: > > On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote: > > > So 5.15 isn't shaping up to be a particularly large release, at least > > > in number of commits. At only just over 10k non-merge commits, this is > > > in fact the smallest rc1 we have had in the 5.x series. We're usually > > > hovering in the 12-14k commit range. > > > > This release takes over two minutes longer to boot on one my > > machines than 5.14. The time just seems to be unaccounted for, even > > with initcall_debug > > Sorry for the inconvenience of this, and thank you very much for doing > the bisection to track it down. > > We *could* revert 7bac54497c3e, but it'd be messy because a bunch of > follow-up stuff depends on it. > > I propose something like the patch below. Would you mind trying it > out? This also fixes the problem for me. Dave ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-13 23:46 ` Bjorn Helgaas 2021-09-14 0:39 ` Dave Jones @ 2021-09-14 6:21 ` Heiner Kallweit 2021-09-14 11:26 ` Bjorn Helgaas 1 sibling, 1 reply; 53+ messages in thread From: Heiner Kallweit @ 2021-09-14 6:21 UTC (permalink / raw) To: Bjorn Helgaas, Dave Jones, Linus Torvalds; +Cc: linux-kernel, linux-pci On 14.09.2021 01:46, Bjorn Helgaas wrote: > On Mon, Sep 13, 2021 at 10:18:18AM -0400, Dave Jones wrote: >> On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote: >> > So 5.15 isn't shaping up to be a particularly large release, at least >> > in number of commits. At only just over 10k non-merge commits, this is >> > in fact the smallest rc1 we have had in the 5.x series. We're usually >> > hovering in the 12-14k commit range. >> >> This release takes over two minutes longer to boot on one my >> machines than 5.14. The time just seems to be unaccounted for, even >> with initcall_debug > >> ... >> [ 2.194093] pci 0000:01:00.0: calling quirk_f0_vpd_link+0x0/0x60 @ 1 >> [ 2.194097] pci 0000:01:00.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs >> [ 2.194100] pci 0000:01:00.0: [8086:10fb] type 00 class 0x020000 >> [ 2.194109] pci 0000:01:00.0: reg 0x10: [mem 0xd0080000-0xd00fffff 64bit pref] >> [ 2.194113] pci 0000:01:00.0: reg 0x18: [io 0xe020-0xe03f] >> [ 2.194121] pci 0000:01:00.0: reg 0x20: [mem 0xd0104000-0xd0107fff 64bit pref] >> [ 2.194126] pci 0000:01:00.0: reg 0x30: [mem 0xdfd80000-0xdfdfffff pref] >> [ 2.194136] pci 0000:01:00.0: calling quirk_igfx_skip_te_disable+0x0/0x50 @ 1 >> [ 2.194139] pci 0000:01:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs >> [ 2.194164] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold >> >> * stall here for 86 seconds * >> >> [ 88.675114] pci 0000:01:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref] >> ... > > >> 7bac54497c3e3b2ca37b7043f1fa78586540f10e is the first bad commit >> commit 7bac54497c3e3b2ca37b7043f1fa78586540f10e >> Author: Heiner Kallweit <hkallweit1@gmail.com> >> Date: Sun Aug 8 19:22:52 2021 +0200 >> >> PCI/VPD: Determine VPD size in pci_vpd_init() >> >> Determine VPD size in pci_vpd_init(). >> >> Quirks set dev->vpd.len to a non-zero value, so they cause us to skip the >> dynamic size calculation. Prerequisite is that we move the quirks from >> FINAL to HEADER so they are run before pci_vpd_init(). >> >> Link: https://lore.kernel.org/r/cc4a6538-557a-294d-4f94-e6d1d3c91589@gmail.com >> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> >> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> >> >> >> Which unfortunately doesn't revert cleanly I can't test it reverted in >> isolation. >> >> My guess is there's something quirky about the PCI bus on this machine >> that causes stalls until we hit timeout, but I'm not sure where to begin >> debugging this. > > Sorry for the inconvenience of this, and thank you very much for doing > the bisection to track it down. > > We *could* revert 7bac54497c3e, but it'd be messy because a bunch of > follow-up stuff depends on it. > > I propose something like the patch below. Would you mind trying it > out? > > > commit 4ede9949b93c ("PCI/VPD: Defer VPD sizing until first access") > Author: Bjorn Helgaas <bhelgaas@google.com> > Date: Mon Sep 13 16:13:26 2021 -0500 > > PCI/VPD: Defer VPD sizing until first access > > 7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()") reads VPD at > enumeration-time to find the size. But this is quite slow, and we don't > need the size until we actually need data from VPD. Dave reported a boot > slowdown of more than two minutes [1]. > > Defer the VPD sizing until a driver or the user requests information from > VPD. If devices are quirked because VPD is known not to work, clear the > vpd.cap pointer so we don't access it at all. > > [1] https://lore.kernel.org/r/20210913141818.GA27911@codemonkey.org.uk/ > Fixes: 7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()") > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c > index 25557b272a4f..ca823ceee10c 100644 > --- a/drivers/pci/vpd.c > +++ b/drivers/pci/vpd.c > @@ -46,13 +46,12 @@ static struct pci_dev *pci_get_func0_dev(struct pci_dev *dev) > } > > #define PCI_VPD_MAX_SIZE (PCI_VPD_ADDR_MASK + 1) > -#define PCI_VPD_SZ_INVALID UINT_MAX > > /** > * pci_vpd_size - determine actual size of Vital Product Data > * @dev: pci device struct > */ > -static size_t pci_vpd_size(struct pci_dev *dev) > +static void pci_vpd_size(struct pci_dev *dev) > { > size_t off = 0, size; > unsigned char tag, header[1+2]; /* 1 byte tag, 2 bytes length */ > @@ -71,7 +70,7 @@ static size_t pci_vpd_size(struct pci_dev *dev) > if (pci_read_vpd(dev, off + 1, 2, &header[1]) != 2) { > pci_warn(dev, "failed VPD read at offset %zu\n", > off + 1); > - return off ?: PCI_VPD_SZ_INVALID; > + goto finish; > } > size = pci_vpd_lrdt_size(header); > if (off + size > PCI_VPD_MAX_SIZE) > @@ -87,16 +86,19 @@ static size_t pci_vpd_size(struct pci_dev *dev) > > off += PCI_VPD_SRDT_TAG_SIZE + size; > if (tag == PCI_VPD_STIN_END) /* End tag descriptor */ > - return off; > + goto finish; > } > } > - return off; > + goto finish; > > error: > pci_info(dev, "invalid VPD tag %#04x (size %zu) at offset %zu%s\n", > header[0], size, off, off == 0 ? > "; assume missing optional EEPROM" : ""); > - return off ?: PCI_VPD_SZ_INVALID; > +finish: > + dev->vpd.len = off; > + if (off == 0) > + dev->vpd.cap = 0; /* No VPD at all */ > } > > /* > @@ -145,9 +147,6 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count, > loff_t end = pos + count; > u8 *buf = arg; > > - if (!vpd->cap) > - return -ENODEV; > - > if (pos < 0) > return -EINVAL; > > @@ -206,9 +205,6 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count, > loff_t end = pos + count; > int ret = 0; > > - if (!vpd->cap) > - return -ENODEV; > - > if (pos < 0 || (pos & 3) || (count & 3)) > return -EINVAL; > > @@ -244,12 +240,6 @@ void pci_vpd_init(struct pci_dev *dev) > { > dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD); > mutex_init(&dev->vpd.lock); > - > - if (!dev->vpd.len) > - dev->vpd.len = pci_vpd_size(dev); > - > - if (dev->vpd.len == PCI_VPD_SZ_INVALID) > - dev->vpd.cap = 0; > } > > static ssize_t vpd_read(struct file *filp, struct kobject *kobj, > @@ -294,25 +284,29 @@ const struct attribute_group pci_dev_vpd_attr_group = { > > void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size) > { > - unsigned int len = dev->vpd.len; > + struct pci_vpd *vpd = &dev->vpd; > + unsigned int len; > void *buf; > int cnt; > > - if (!dev->vpd.cap) > + if (!vpd->cap) > return ERR_PTR(-ENODEV); > > buf = kmalloc(len, GFP_KERNEL); > if (!buf) > return ERR_PTR(-ENOMEM); > > - cnt = pci_read_vpd(dev, 0, len, buf); > + if (vpd->len == 0) > + pci_vpd_size(dev); > + > + cnt = pci_read_vpd(dev, 0, vpd->len, buf); > if (cnt != len) { > kfree(buf); > return ERR_PTR(-EIO); > } > > if (size) > - *size = len; > + *size = vpd->len; > > return buf; > } > @@ -374,6 +368,7 @@ static int pci_vpd_find_info_keyword(const u8 *buf, unsigned int off, > */ > ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf) > { > + struct pci_vpd *vpd; > ssize_t ret; > > if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { > @@ -381,11 +376,27 @@ ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf) > if (!dev) > return -ENODEV; > > + vpd = &dev->vpd; > + if (!vpd->cap) { > + pci_dev_put(dev); > + return -ENODEV; > + } > + > + if (vpd->len == 0) > + pci_vpd_size(dev); > + > ret = pci_vpd_read(dev, pos, count, buf); > pci_dev_put(dev); > return ret; > } > > + vpd = &dev->vpd; > + if (!vpd->cap) > + return -ENODEV; > + > + if (vpd->len == 0) > + pci_vpd_size(dev); > + > return pci_vpd_read(dev, pos, count, buf); > } > EXPORT_SYMBOL(pci_read_vpd); > @@ -399,6 +410,7 @@ EXPORT_SYMBOL(pci_read_vpd); > */ > ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf) > { > + struct pci_vpd *vpd; > ssize_t ret; > > if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) { > @@ -406,11 +418,26 @@ ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void > if (!dev) > return -ENODEV; > > + vpd = &dev->vpd; > + if (!vpd->cap) { > + pci_dev_put(dev); > + return -ENODEV; > + } > + > + if (vpd->len == 0) > + pci_vpd_size(dev); > + > ret = pci_vpd_write(dev, pos, count, buf); > pci_dev_put(dev); > return ret; > } > > + if (!vpd->cap) > + return -ENODEV; > + > + if (vpd->len == 0) > + pci_vpd_size(dev); > + > return pci_vpd_write(dev, pos, count, buf); > } > EXPORT_SYMBOL(pci_write_vpd); > @@ -500,27 +527,27 @@ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, > */ > static void quirk_blacklist_vpd(struct pci_dev *dev) > { > - dev->vpd.len = PCI_VPD_SZ_INVALID; > + dev->vpd.cap = 0; > pci_warn(dev, FW_BUG "disabling VPD access (can't determine size of non-standard VPD format)\n"); > } > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd); > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd); > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd); > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd); > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd); > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd); > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd); > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd); > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd); > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd); > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd); > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); > /* Leaving the quirks in FIXUP_HEADER stage would have the advantage that for blacklisted devices the vpd sysfs attribute isn't visibale. The needed changes to the patch are minimal. > * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port > * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class. > */ > -DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031, > - PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd); > +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031, > + PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd); > > static void quirk_chelsio_extend_vpd(struct pci_dev *dev) > { > @@ -545,7 +572,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev) > dev->vpd.len = 2048; > } > > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID, > - quirk_chelsio_extend_vpd); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID, > + quirk_chelsio_extend_vpd); > > #endif > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-14 6:21 ` Heiner Kallweit @ 2021-09-14 11:26 ` Bjorn Helgaas 2021-09-14 17:07 ` Heiner Kallweit 0 siblings, 1 reply; 53+ messages in thread From: Bjorn Helgaas @ 2021-09-14 11:26 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote: > On 14.09.2021 01:46, Bjorn Helgaas wrote: > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); > > /* > > Leaving the quirks in FIXUP_HEADER stage would have the advantage that for > blacklisted devices the vpd sysfs attribute isn't visibale. The needed > changes to the patch are minimal. What do you have in mind? The only thing I can think of would be to add a "pci_dev.no_vpd" bit. "vpd.cap == 0" means the device has no VPD, and "vpd.len == 0" means we haven't determined the size yet. All devices start off with vpd.cap == 0 and vpd.len == 0, so a FIXUP_HEADER quirk would have to set a sentinel value or some other bit. Bjorn ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-14 11:26 ` Bjorn Helgaas @ 2021-09-14 17:07 ` Heiner Kallweit 2021-09-14 21:55 ` Bjorn Helgaas 0 siblings, 1 reply; 53+ messages in thread From: Heiner Kallweit @ 2021-09-14 17:07 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci On 14.09.2021 13:26, Bjorn Helgaas wrote: > On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote: >> On 14.09.2021 01:46, Bjorn Helgaas wrote: > >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); >>> /* >> >> Leaving the quirks in FIXUP_HEADER stage would have the advantage that for >> blacklisted devices the vpd sysfs attribute isn't visibale. The needed >> changes to the patch are minimal. > > What do you have in mind? The only thing I can think of would be to > add a "pci_dev.no_vpd" bit. "vpd.cap == 0" means the device has no > VPD, and "vpd.len == 0" means we haven't determined the size yet. All > devices start off with vpd.cap == 0 and vpd.len == 0, so a > FIXUP_HEADER quirk would have to set a sentinel value or some other > bit. > > Bjorn > Why not leave vpd.len == PCI_VPD_SZ_INVALID as sentinel? And one more question: Why do you move the "if (!vpd->cap)" check from pci_vpd_read() to pci_read_vpd()? At a first glance I see no benefit. Here comes my version. Your changes to pci_vpd_size() I left as-is. I tested the positive case and it works as expected. diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c index 25557b272..04b14c488 100644 --- a/drivers/pci/vpd.c +++ b/drivers/pci/vpd.c @@ -52,7 +52,7 @@ static struct pci_dev *pci_get_func0_dev(struct pci_dev *dev) * pci_vpd_size - determine actual size of Vital Product Data * @dev: pci device struct */ -static size_t pci_vpd_size(struct pci_dev *dev) +static void pci_vpd_size(struct pci_dev *dev) { size_t off = 0, size; unsigned char tag, header[1+2]; /* 1 byte tag, 2 bytes length */ @@ -71,7 +71,7 @@ static size_t pci_vpd_size(struct pci_dev *dev) if (pci_read_vpd(dev, off + 1, 2, &header[1]) != 2) { pci_warn(dev, "failed VPD read at offset %zu\n", off + 1); - return off ?: PCI_VPD_SZ_INVALID; + goto finish; } size = pci_vpd_lrdt_size(header); if (off + size > PCI_VPD_MAX_SIZE) @@ -87,16 +87,19 @@ static size_t pci_vpd_size(struct pci_dev *dev) off += PCI_VPD_SRDT_TAG_SIZE + size; if (tag == PCI_VPD_STIN_END) /* End tag descriptor */ - return off; + goto finish; } } - return off; + goto finish; error: pci_info(dev, "invalid VPD tag %#04x (size %zu) at offset %zu%s\n", header[0], size, off, off == 0 ? "; assume missing optional EEPROM" : ""); - return off ?: PCI_VPD_SZ_INVALID; +finish: + dev->vpd.len = off; + if (off == 0) + dev->vpd.cap = 0; /* No VPD at all */ } /* @@ -145,6 +148,8 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count, loff_t end = pos + count; u8 *buf = arg; + if (vpd->len == 0 && vpd->cap) + pci_vpd_size(dev); if (!vpd->cap) return -ENODEV; @@ -206,6 +211,8 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count, loff_t end = pos + count; int ret = 0; + if (vpd->len == 0 && vpd->cap) + pci_vpd_size(dev); if (!vpd->cap) return -ENODEV; @@ -245,9 +252,6 @@ void pci_vpd_init(struct pci_dev *dev) dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD); mutex_init(&dev->vpd.lock); - if (!dev->vpd.len) - dev->vpd.len = pci_vpd_size(dev); - if (dev->vpd.len == PCI_VPD_SZ_INVALID) dev->vpd.cap = 0; } @@ -294,25 +298,27 @@ const struct attribute_group pci_dev_vpd_attr_group = { void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size) { - unsigned int len = dev->vpd.len; + struct pci_vpd *vpd = &dev->vpd; void *buf; int cnt; - if (!dev->vpd.cap) + if (vpd->len == 0 && vpd->cap) + pci_vpd_size(dev); + if (!vpd->cap) return ERR_PTR(-ENODEV); - buf = kmalloc(len, GFP_KERNEL); + buf = kmalloc(vpd->len, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM); - cnt = pci_read_vpd(dev, 0, len, buf); - if (cnt != len) { + cnt = pci_read_vpd(dev, 0, vpd->len, buf); + if (cnt != vpd->len) { kfree(buf); return ERR_PTR(-EIO); } if (size) - *size = len; + *size = vpd->len; return buf; } -- 2.33.0 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-14 17:07 ` Heiner Kallweit @ 2021-09-14 21:55 ` Bjorn Helgaas 2021-09-14 22:06 ` Heiner Kallweit 0 siblings, 1 reply; 53+ messages in thread From: Bjorn Helgaas @ 2021-09-14 21:55 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci On Tue, Sep 14, 2021 at 07:07:40PM +0200, Heiner Kallweit wrote: > On 14.09.2021 13:26, Bjorn Helgaas wrote: > > On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote: > >> On 14.09.2021 01:46, Bjorn Helgaas wrote: > > > >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); > >>> /* > >> > >> Leaving the quirks in FIXUP_HEADER stage would have the advantage that for > >> blacklisted devices the vpd sysfs attribute isn't visibale. The needed > >> changes to the patch are minimal. > > > > What do you have in mind? The only thing I can think of would be to > > add a "pci_dev.no_vpd" bit. "vpd.cap == 0" means the device has no > > VPD, and "vpd.len == 0" means we haven't determined the size yet. All > > devices start off with vpd.cap == 0 and vpd.len == 0, so a > > FIXUP_HEADER quirk would have to set a sentinel value or some other > > bit. > > Why not leave vpd.len == PCI_VPD_SZ_INVALID as sentinel? Sentinel values aren't really my favorite thing, but it certainly does have the advantage of hiding the sysfs attribute. > And one more question: Why do you move the "if (!vpd->cap)" check from > pci_vpd_read() to pci_read_vpd()? At a first glance I see no benefit. I'm pretty sure I *had* a reason, but I can't remember right now :( Moving it sure does uglify pci_read_vpd() and pci_write_vpd(), though. What do you think of the following? (This is a diff from v5.15-rc1.) diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c index 25557b272a4f..4be24890132e 100644 --- a/drivers/pci/vpd.c +++ b/drivers/pci/vpd.c @@ -99,6 +99,24 @@ static size_t pci_vpd_size(struct pci_dev *dev) return off ?: PCI_VPD_SZ_INVALID; } +static bool pci_vpd_available(struct pci_dev *dev) +{ + struct pci_vpd *vpd = &dev->vpd; + + if (!vpd->cap) + return false; + + if (vpd->len == 0) { + vpd->len = pci_vpd_size(dev); + if (vpd->len == PCI_VPD_SZ_INVALID) { + vpd->cap = 0; + return false; + } + } + + return true; +} + /* * Wait for last operation to complete. * This code has to spin since there is no other notification from the PCI @@ -145,7 +163,7 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count, loff_t end = pos + count; u8 *buf = arg; - if (!vpd->cap) + if (!pci_vpd_available(dev)) return -ENODEV; if (pos < 0) @@ -206,7 +224,7 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count, loff_t end = pos + count; int ret = 0; - if (!vpd->cap) + if (!pci_vpd_available(dev)) return -ENODEV; if (pos < 0 || (pos & 3) || (count & 3)) @@ -242,14 +260,11 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count, void pci_vpd_init(struct pci_dev *dev) { + if (dev->vpd.len == PCI_VPD_SZ_INVALID) + return; + dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD); mutex_init(&dev->vpd.lock); - - if (!dev->vpd.len) - dev->vpd.len = pci_vpd_size(dev); - - if (dev->vpd.len == PCI_VPD_SZ_INVALID) - dev->vpd.cap = 0; } static ssize_t vpd_read(struct file *filp, struct kobject *kobj, @@ -294,13 +309,14 @@ const struct attribute_group pci_dev_vpd_attr_group = { void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size) { - unsigned int len = dev->vpd.len; + unsigned int len; void *buf; int cnt; - if (!dev->vpd.cap) + if (!pci_vpd_available(dev)) return ERR_PTR(-ENODEV); + len = dev->vpd.len; buf = kmalloc(len, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM); ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-14 21:55 ` Bjorn Helgaas @ 2021-09-14 22:06 ` Heiner Kallweit 2021-09-14 22:33 ` Dave Jones 0 siblings, 1 reply; 53+ messages in thread From: Heiner Kallweit @ 2021-09-14 22:06 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci On 14.09.2021 23:55, Bjorn Helgaas wrote: > On Tue, Sep 14, 2021 at 07:07:40PM +0200, Heiner Kallweit wrote: >> On 14.09.2021 13:26, Bjorn Helgaas wrote: >>> On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote: >>>> On 14.09.2021 01:46, Bjorn Helgaas wrote: >>> >>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd); >>>>> /* >>>> >>>> Leaving the quirks in FIXUP_HEADER stage would have the advantage that for >>>> blacklisted devices the vpd sysfs attribute isn't visibale. The needed >>>> changes to the patch are minimal. >>> >>> What do you have in mind? The only thing I can think of would be to >>> add a "pci_dev.no_vpd" bit. "vpd.cap == 0" means the device has no >>> VPD, and "vpd.len == 0" means we haven't determined the size yet. All >>> devices start off with vpd.cap == 0 and vpd.len == 0, so a >>> FIXUP_HEADER quirk would have to set a sentinel value or some other >>> bit. >> >> Why not leave vpd.len == PCI_VPD_SZ_INVALID as sentinel? > > Sentinel values aren't really my favorite thing, but it certainly does > have the advantage of hiding the sysfs attribute. > >> And one more question: Why do you move the "if (!vpd->cap)" check from >> pci_vpd_read() to pci_read_vpd()? At a first glance I see no benefit. > > I'm pretty sure I *had* a reason, but I can't remember right now :( > Moving it sure does uglify pci_read_vpd() and pci_write_vpd(), though. > > What do you think of the following? (This is a diff from v5.15-rc1.) > > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c > index 25557b272a4f..4be24890132e 100644 > --- a/drivers/pci/vpd.c > +++ b/drivers/pci/vpd.c > @@ -99,6 +99,24 @@ static size_t pci_vpd_size(struct pci_dev *dev) > return off ?: PCI_VPD_SZ_INVALID; > } > > +static bool pci_vpd_available(struct pci_dev *dev) > +{ > + struct pci_vpd *vpd = &dev->vpd; > + > + if (!vpd->cap) > + return false; > + > + if (vpd->len == 0) { > + vpd->len = pci_vpd_size(dev); > + if (vpd->len == PCI_VPD_SZ_INVALID) { > + vpd->cap = 0; > + return false; > + } > + } > + > + return true; > +} > + > /* > * Wait for last operation to complete. > * This code has to spin since there is no other notification from the PCI > @@ -145,7 +163,7 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count, > loff_t end = pos + count; > u8 *buf = arg; > > - if (!vpd->cap) > + if (!pci_vpd_available(dev)) > return -ENODEV; > > if (pos < 0) > @@ -206,7 +224,7 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count, > loff_t end = pos + count; > int ret = 0; > > - if (!vpd->cap) > + if (!pci_vpd_available(dev)) > return -ENODEV; > > if (pos < 0 || (pos & 3) || (count & 3)) > @@ -242,14 +260,11 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count, > > void pci_vpd_init(struct pci_dev *dev) > { > + if (dev->vpd.len == PCI_VPD_SZ_INVALID) > + return; > + > dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD); > mutex_init(&dev->vpd.lock); > - > - if (!dev->vpd.len) > - dev->vpd.len = pci_vpd_size(dev); > - > - if (dev->vpd.len == PCI_VPD_SZ_INVALID) > - dev->vpd.cap = 0; > } > > static ssize_t vpd_read(struct file *filp, struct kobject *kobj, > @@ -294,13 +309,14 @@ const struct attribute_group pci_dev_vpd_attr_group = { > > void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size) > { > - unsigned int len = dev->vpd.len; > + unsigned int len; > void *buf; > int cnt; > > - if (!dev->vpd.cap) > + if (!pci_vpd_available(dev)) > return ERR_PTR(-ENODEV); > > + len = dev->vpd.len; > buf = kmalloc(len, GFP_KERNEL); > if (!buf) > return ERR_PTR(-ENOMEM); > This looks very good to me. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-14 22:06 ` Heiner Kallweit @ 2021-09-14 22:33 ` Dave Jones 2021-09-15 21:31 ` Bjorn Helgaas 0 siblings, 1 reply; 53+ messages in thread From: Dave Jones @ 2021-09-14 22:33 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Bjorn Helgaas, Linus Torvalds, linux-kernel, linux-pci On Wed, Sep 15, 2021 at 12:06:33AM +0200, Heiner Kallweit wrote: > > What do you think of the following? (This is a diff from v5.15-rc1.) > > > > This looks very good to me. fwiw, I tested this too, and it still works. Dave ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Linux 5.15-rc1 2021-09-14 22:33 ` Dave Jones @ 2021-09-15 21:31 ` Bjorn Helgaas 0 siblings, 0 replies; 53+ messages in thread From: Bjorn Helgaas @ 2021-09-15 21:31 UTC (permalink / raw) To: Dave Jones, Heiner Kallweit, Linus Torvalds, linux-kernel, linux-pci On Tue, Sep 14, 2021 at 06:33:27PM -0400, Dave Jones wrote: > On Wed, Sep 15, 2021 at 12:06:33AM +0200, Heiner Kallweit wrote: > > > > What do you think of the following? (This is a diff from v5.15-rc1.) > > > > > > > This looks very good to me. > > fwiw, I tested this too, and it still works. Thanks very much for proactively testing this. I hated to burden you before anybody else had looked at it. I added this to for-linus for v5.15. Bjorn ^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2021-09-30 13:44 UTC | newest] Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-12 23:58 Linux 5.15-rc1 Linus Torvalds 2021-09-13 3:57 ` Guenter Roeck 2021-09-28 23:18 ` Michael S. Tsirkin 2021-09-30 13:44 ` Guenter Roeck 2021-09-13 14:18 ` Dave Jones 2021-09-13 18:59 ` Heiner Kallweit 2021-09-13 19:51 ` Linus Torvalds 2021-09-13 20:09 ` Bjorn Helgaas 2021-09-13 20:11 ` Heiner Kallweit 2021-09-13 20:15 ` Linus Torvalds 2021-09-13 20:15 ` Dave Jones 2021-09-13 20:22 ` Heiner Kallweit 2021-09-13 20:32 ` Dave Jones 2021-09-13 20:44 ` Linux 5.15-rc1 - 82599ES VPD access isue Heiner Kallweit 2021-09-13 20:44 ` [Intel-wired-lan] " Heiner Kallweit 2021-09-13 23:32 ` Hisashi T Fujinaka 2021-09-13 23:32 ` Hisashi T Fujinaka 2021-09-14 5:51 ` Heiner Kallweit 2021-09-14 5:51 ` Heiner Kallweit 2021-09-14 14:24 ` Dave Jones 2021-09-14 14:24 ` Dave Jones 2021-09-14 18:28 ` Fujinaka, Todd 2021-09-14 18:28 ` Fujinaka, Todd 2021-09-14 20:00 ` Hisashi T Fujinaka 2021-09-14 20:00 ` Hisashi T Fujinaka 2021-09-14 21:51 ` Heiner Kallweit 2021-09-14 21:51 ` Heiner Kallweit 2021-09-15 14:18 ` Hisashi T Fujinaka 2021-09-15 14:18 ` Hisashi T Fujinaka 2021-09-15 16:05 ` Heiner Kallweit 2021-09-15 16:05 ` Heiner Kallweit 2021-09-15 16:16 ` Hisashi T Fujinaka 2021-09-15 16:16 ` Hisashi T Fujinaka 2021-09-15 22:32 ` Bjorn Helgaas 2021-09-15 22:32 ` Bjorn Helgaas 2021-09-15 23:46 ` Hisashi T Fujinaka 2021-09-15 23:46 ` Hisashi T Fujinaka 2021-09-17 15:09 ` Bjorn Helgaas 2021-09-17 15:09 ` Bjorn Helgaas 2021-09-13 20:59 ` Linux 5.15-rc1 Heiner Kallweit 2021-09-13 23:35 ` Bjorn Helgaas 2021-09-13 20:32 ` Heiner Kallweit 2021-09-13 20:36 ` Linus Torvalds 2021-09-13 20:41 ` Dave Jones 2021-09-13 23:46 ` Bjorn Helgaas 2021-09-14 0:39 ` Dave Jones 2021-09-14 6:21 ` Heiner Kallweit 2021-09-14 11:26 ` Bjorn Helgaas 2021-09-14 17:07 ` Heiner Kallweit 2021-09-14 21:55 ` Bjorn Helgaas 2021-09-14 22:06 ` Heiner Kallweit 2021-09-14 22:33 ` Dave Jones 2021-09-15 21:31 ` Bjorn Helgaas
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.