All of lore.kernel.org
 help / color / mirror / Atom feed
* 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13
@ 2021-07-20 22:32 Andy Smith
  2021-07-21  8:10 ` Jan Beulich
  2021-07-25 14:06 ` Andy Smith
  0 siblings, 2 replies; 7+ messages in thread
From: Andy Smith @ 2021-07-20 22:32 UTC (permalink / raw)
  To: xen-devel

Hi,

I have a Debian 10 (buster/stable) dom0 running hypervisor 4.14.2.
For almost 2 years it's been using the packaged Debian stable kernel
which is 4.19.x.

Last night I upgraded the kernel to the buster-backports package
which is based on 5.10.40 and about 4 hours later got this:

Jul 20 02:17:54 lamb kernel: [21061.388607] sg[0] phys_addr:0x00000015eb803000 offset:0 length:4096 dma_address:0x000000209e7b7000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.389775] sg[1] phys_addr:0x00000015eb7bc000 offset:0 length:4096 dma_address:0x000000209e7b8000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.390874] sg[2] phys_addr:0x00000015eb809000 offset:0 length:4096 dma_address:0x000000209e7b9000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.391974] sg[3] phys_addr:0x00000015eb766000 offset:0 length:4096 dma_address:0x000000209e7ba000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.393042] sg[4] phys_addr:0x00000015eb7a3000 offset:0 length:4096 dma_address:0x000000209e7bb000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.394086] sg[5] phys_addr:0x00000015eb7c6000 offset:0 length:4096 dma_address:0x000000209e7bc000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.395078] sg[6] phys_addr:0x00000015eb7c2000 offset:0 length:4096 dma_address:0x000000209e7bd000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.396042] sg[7] phys_addr:0x00000015eb7a9000 offset:0 length:4096 dma_address:0x000000209e7be000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.397004] sg[8] phys_addr:0x00000015eb775000 offset:0 length:4096 dma_address:0x000000209e7bf000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.397971] sg[9] phys_addr:0x00000015eb7c7000 offset:0 length:4096 dma_address:0x00000020ff520000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.398889] sg[10] phys_addr:0x00000015eb7cb000 offset:0 length:4096 dma_address:0x00000020ff521000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.399814] sg[11] phys_addr:0x00000015eb7e3000 offset:0 length:61952 dma_address:0x00000020ff522000 dma_length:61952
Jul 20 02:17:54 lamb kernel: [21061.400754] sg[12] phys_addr:0x00000015eb7f2200 offset:512 length:24064 dma_address:0x00000020ff531200 dma_length:24064
Jul 20 02:17:54 lamb kernel: [21061.401781] ------------[ cut here ]------------
Jul 20 02:17:54 lamb kernel: [21061.402738] Invalid SGL for payload:131072 nents:13
Jul 20 02:17:54 lamb kernel: [21061.403724] WARNING: CPU: 1 PID: 12669 at drivers/nvme/host/pci.c:716 nvme_map_data+0x7e0/0x820 [nvme]
Jul 20 02:17:54 lamb kernel: [21061.404728] Modules linked in: binfmt_misc ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpmss nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit nfnetlink_log nfnetlink xt_NFLOG xt_multiport xt_tcpudp ip6table_filter ip6_tables iptable_filter bonding btrfs blake2b_generic dm_snapshot dm_bufio intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm intel_powerclamp crc32_pclmul ghash_clmulni_intel ipmi_ssif aesni_intel libaes crypto_simd cryptd glue_helper snd_hda_intel snd_intel_dspcfg mei_wdt soundwire_intel soundwire_generic_allocation nvme wdat_wdt snd_soc_core ast snd_compress watchdog drm_vram_helper drm_ttm_helper soundwire_cadence pcspkr nvme_core ttm snd_hda_codec drm_kms_helper snd_hda_core i2c_i801 snd_hwdep i2c_smbus cec soundwire_bus snd_pcm drm snd_timer snd soundcore igb ptp pps_core i2c_algo_bit joydev mei_me sg mei intel_lpss_pci intel_lpss idma64 acpi_ipmi ipmi_si ipmi_devintf ioatdma dca wmi ipmi_msghandler button dm_mod xenfs xen_acpi_processor
Jul 20 02:17:54 lamb kernel: [21061.404831]  xen_privcmd xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid456 libcrc32c crc32c_generic async_raid6_recov async_memcpy async_pq async_xor xor async_tx evdev hid_generic usbhid hid raid6_pq raid0 multipath linear raid10 raid1 md_mod sd_mod t10_pi crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common xhci_pci ahci libahci crc32c_intel xhci_hcd libata usbcore scsi_mod usb_common
Jul 20 02:17:54 lamb kernel: [21061.417998] CPU: 1 PID: 12669 Comm: 62.xvda-0 Not tainted 5.10.0-0.bpo.7-amd64 #1 Debian 5.10.40-1~bpo10+1
Jul 20 02:17:54 lamb kernel: [21061.418459] Hardware name: Supermicro Super Server/X11SRM-VF, BIOS 1.2a 02/18/2019
Jul 20 02:17:54 lamb kernel: [21061.418922] RIP: e030:nvme_map_data+0x7e0/0x820 [nvme]
Jul 20 02:17:54 lamb kernel: [21061.419354] Code: d0 7b c0 48 c7 c7 40 d6 7b c0 e8 5b 44 c9 c0 8b 93 4c 01 00 00 f6 43 1e 04 75 36 8b 73 28 48 c7 c7 20 9c 7b c0 e8 8b 71 09 c1 <0f> 0b 41 bd 0a 00 00 00 e9 f7 fe ff ff 48 8d bd 68 02 00 00 48 89
Jul 20 02:17:54 lamb kernel: [21061.420271] RSP: e02b:ffffc90044797930 EFLAGS: 00010286
Jul 20 02:17:54 lamb kernel: [21061.420727] RAX: 0000000000000000 RBX: ffff888157db4200 RCX: 0000000000000027
Jul 20 02:17:54 lamb kernel: [21061.421186] RDX: 0000000000000027 RSI: ffff888292858a00 RDI: ffff888292858a08
Jul 20 02:17:54 lamb kernel: [21061.421639] RBP: ffff888103243000 R08: 0000000000000000 R09: c00000010000118b
Jul 20 02:17:54 lamb kernel: [21061.422090] R10: 0000000000165920 R11: ffffc90044797738 R12: ffffffffc07b9bd0
Jul 20 02:17:54 lamb kernel: [21061.422583] R13: 000000000000000d R14: 0000000000000000 R15: 000000000000000d
Jul 20 02:17:54 lamb kernel: [21061.423052] FS:  0000000000000000(0000) GS:ffff888292840000(0000) knlGS:0000000000000000
Jul 20 02:17:54 lamb kernel: [21061.423518] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 20 02:17:54 lamb kernel: [21061.423986] CR2: 00007f909a037c30 CR3: 000000010d2dc000 CR4: 0000000000050660
Jul 20 02:17:54 lamb kernel: [21061.424472] Call Trace:
Jul 20 02:17:54 lamb kernel: [21061.424943]  nvme_queue_rq+0x98/0x190 [nvme]
Jul 20 02:17:54 lamb kernel: [21061.425425]  blk_mq_dispatch_rq_list+0x123/0x7d0
Jul 20 02:17:54 lamb kernel: [21061.425904]  ? sbitmap_get+0x66/0x140
Jul 20 02:17:54 lamb kernel: [21061.426385]  ? elv_rb_del+0x1f/0x30
Jul 20 02:17:54 lamb kernel: [21061.426909]  ? deadline_remove_request+0x55/0xc0
Jul 20 02:17:54 lamb kernel: [21061.427373]  __blk_mq_do_dispatch_sched+0x164/0x2d0
Jul 20 02:17:54 lamb kernel: [21061.427843]  __blk_mq_sched_dispatch_requests+0x135/0x170
Jul 20 02:17:54 lamb kernel: [21061.428310]  blk_mq_sched_dispatch_requests+0x30/0x60
Jul 20 02:17:54 lamb kernel: [21061.428795]  __blk_mq_run_hw_queue+0x51/0xd0
Jul 20 02:17:54 lamb kernel: [21061.429269]  __blk_mq_delay_run_hw_queue+0x141/0x160
Jul 20 02:17:54 lamb kernel: [21061.429752]  blk_mq_sched_insert_requests+0x6a/0xf0
Jul 20 02:17:54 lamb kernel: [21061.430233]  blk_mq_flush_plug_list+0x119/0x1b0
Jul 20 02:17:54 lamb kernel: [21061.430756]  blk_flush_plug_list+0xd7/0x100
Jul 20 02:17:54 lamb kernel: [21061.431241]  blk_finish_plug+0x21/0x30
Jul 20 02:17:54 lamb kernel: [21061.431734]  dispatch_rw_block_io+0x6a5/0x9a0 [xen_blkback]
Jul 20 02:17:54 lamb kernel: [21061.432220]  __do_block_io_op+0x31d/0x620 [xen_blkback]
Jul 20 02:17:54 lamb kernel: [21061.432714]  ? _raw_spin_unlock_irqrestore+0x14/0x20
Jul 20 02:17:54 lamb kernel: [21061.433193]  ? try_to_del_timer_sync+0x4d/0x80
Jul 20 02:17:54 lamb kernel: [21061.433680]  xen_blkif_schedule+0xda/0x670 [xen_blkback]
Jul 20 02:17:54 lamb kernel: [21061.434160]  ? __schedule+0x2c6/0x770
Jul 20 02:17:54 lamb kernel: [21061.434679]  ? finish_wait+0x80/0x80
Jul 20 02:17:54 lamb kernel: [21061.435129]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
Jul 20 02:17:54 lamb kernel: [21061.435571]  kthread+0x116/0x130
Jul 20 02:17:54 lamb kernel: [21061.436002]  ? kthread_park+0x80/0x80
Jul 20 02:17:54 lamb kernel: [21061.436422]  ret_from_fork+0x22/0x30
Jul 20 02:17:54 lamb kernel: [21061.436846] ---[ end trace 1d90be7aea2d9148 ]---
Jul 20 02:17:54 lamb kernel: [21061.437250] blk_update_request: I/O error, dev nvme0n1, sector 912000815 op 0x1:(WRITE) flags 0x800 phys_seg 13 prio class 0
Jul 20 02:17:54 lamb kernel: [21061.446344] md/raid1:md4: Disk failure on nvme0n1, disabling device.
Jul 20 02:17:54 lamb kernel: [21061.446344] md/raid1:md4: Operation continuing on 1 devices.

I was able to fail and then re-add nvme0n1 to the MD array, which
resynced successfullyi, so no reboot needed. But then later:

Jul 20 20:43:23 lamb kernel: [87388.876154] blk_update_request: I/O error, dev nvme0n1, sector 916064223 op 0x1:(WRITE) flags 0x800 phys_seg 28 prio class 0
Jul 20 20:43:23 lamb kernel: [87388.877750] md/raid1:md4: Disk failure on nvme0n1, disabling device.
Jul 20 20:43:23 lamb kernel: [87388.877750] md/raid1:md4: Operation continuing on 1 devices.

So had to do a re-add again. No call trace this time because this
code path seems to be a WARN(DO_ONCE()).

I do not suspect NVMe device failure because of how this happened so
quickly after changing the kernel. I am wondering if there is some
bug that was fixed and backported by Debian for 4.19.x but not for
the buster-backports kernel?

I've reported this to linux-nvme as well, but I did find this thread
from 2017 with a similar warning:

    http://lists.infradead.org/pipermail/linux-nvme/2017-July/012055.html

In it Christoph Hellwig says, "Given that this is a Xen system I
wonder if swiotlb-xen is involved here, which does some odd chunking
of DMA translations?" so I thought I'd ask here as well in case
anyone is aware of something like this. Unfortunately that thread
ends without confirmed resolution.

If you are aware of any such issue+fix I will check if it got into
Debian's 4.19.x.

I haven't yet just tested latest mainline kernel because this is a
production machine and I'll need to schedule downtime, which I'd
rather not do unless there is a known issue and fix.

If you need any additional info or have any other ideas please do
let me know.

Thanks!
Andy


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13
  2021-07-20 22:32 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13 Andy Smith
@ 2021-07-21  8:10 ` Jan Beulich
  2021-07-21 14:19   ` Andy Smith
  2021-07-25 14:06 ` Andy Smith
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2021-07-21  8:10 UTC (permalink / raw)
  To: Andy Smith; +Cc: xen-devel

On 21.07.2021 00:32, Andy Smith wrote:
> I have a Debian 10 (buster/stable) dom0 running hypervisor 4.14.2.
> For almost 2 years it's been using the packaged Debian stable kernel
> which is 4.19.x.
> 
> Last night I upgraded the kernel to the buster-backports package
> which is based on 5.10.40 and about 4 hours later got this:
> 
> Jul 20 02:17:54 lamb kernel: [21061.388607] sg[0] phys_addr:0x00000015eb803000 offset:0 length:4096 dma_address:0x000000209e7b7000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.389775] sg[1] phys_addr:0x00000015eb7bc000 offset:0 length:4096 dma_address:0x000000209e7b8000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.390874] sg[2] phys_addr:0x00000015eb809000 offset:0 length:4096 dma_address:0x000000209e7b9000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.391974] sg[3] phys_addr:0x00000015eb766000 offset:0 length:4096 dma_address:0x000000209e7ba000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.393042] sg[4] phys_addr:0x00000015eb7a3000 offset:0 length:4096 dma_address:0x000000209e7bb000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.394086] sg[5] phys_addr:0x00000015eb7c6000 offset:0 length:4096 dma_address:0x000000209e7bc000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.395078] sg[6] phys_addr:0x00000015eb7c2000 offset:0 length:4096 dma_address:0x000000209e7bd000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.396042] sg[7] phys_addr:0x00000015eb7a9000 offset:0 length:4096 dma_address:0x000000209e7be000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.397004] sg[8] phys_addr:0x00000015eb775000 offset:0 length:4096 dma_address:0x000000209e7bf000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.397971] sg[9] phys_addr:0x00000015eb7c7000 offset:0 length:4096 dma_address:0x00000020ff520000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.398889] sg[10] phys_addr:0x00000015eb7cb000 offset:0 length:4096 dma_address:0x00000020ff521000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.399814] sg[11] phys_addr:0x00000015eb7e3000 offset:0 length:61952 dma_address:0x00000020ff522000 dma_length:61952
> Jul 20 02:17:54 lamb kernel: [21061.400754] sg[12] phys_addr:0x00000015eb7f2200 offset:512 length:24064 dma_address:0x00000020ff531200 dma_length:24064

The nvme driver, by calling blk_queue_virt_boundary(), announces that
it wants sg elements split at 4k boundaries. The last two segments,
while contiguous in both physical and DMA space, have been split at
a boundary that's not in line with this driver restriction. (Element
11 is [perhaps not just by coincidence] 11*11 sectors long. Adding
another request of 11 sectors would bump total size of the element
past 64k, which iirc the block layer uses as an upper cap on sg
element size. Hence I expect the series of requests submitted to have
remained split at that position, rather than the bio-s sg[12] is
comprised of to have got merged into sg[11].)

Since xen-blkback only talks in terms of bio-s, I don't think it is
the party responsible for honoring such driver restrictions. Instead
I'd expect the block layer's bio merging to be where this needs to be
observed. Perhaps it simply doesn't expect to be passed requests in
multiples of 11 sectors together with the capping at 64k (as said -
iirc) and driver restrictions on where splits may occur? And as to
earlier Linux versions working - perhaps the merging logic was less
aggressive back then?

> I've reported this to linux-nvme as well, but I did find this thread
> from 2017 with a similar warning:
> 
>     http://lists.infradead.org/pipermail/linux-nvme/2017-July/012055.html
> 
> In it Christoph Hellwig says, "Given that this is a Xen system I
> wonder if swiotlb-xen is involved here, which does some odd chunking
> of DMA translations?" so I thought I'd ask here as well in case
> anyone is aware of something like this. Unfortunately that thread
> ends without confirmed resolution.

I don't think swiotlb has screwed this up: From the dump of the sg
list we can see that up to element 11 all elements are suitably 4k
aligned. It wouldn't alter the element sizes of an sg list it gets
handed. Yet if the size of a non-final list element isn't a multiple
of the boundary restriction the driver has put in place, then - no
matter whether offset-into-page values got changed - the requirement
is necessarily violated.

(As an aside, seeing the DMA addresses in the dumped sg list, it is
quite clear that this isn't a very efficient way to express the
request to hardware. There would really only 2 elements be needed,
one covering elements 0-8 and the other covering elements 9-12. All
that matters to hardware is contiguity in DMA space.)

Jan



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13
  2021-07-21  8:10 ` Jan Beulich
@ 2021-07-21 14:19   ` Andy Smith
  2021-07-21 14:49     ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Smith @ 2021-07-21 14:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

Hi Jan,

On Wed, Jul 21, 2021 at 10:10:13AM +0200, Jan Beulich wrote:
> Since xen-blkback only talks in terms of bio-s, I don't think it is
> the party responsible for honoring such driver restrictions. Instead
> I'd expect the block layer's bio merging to be where this needs to be
> observed. Perhaps it simply doesn't expect to be passed requests in
> multiples of 11 sectors together with the capping at 64k (as said -
> iirc) and driver restrictions on where splits may occur? And as to
> earlier Linux versions working - perhaps the merging logic was less
> aggressive back then?

I later realised that there was another change in my setup besides
the kernel version going from 4.19 to 5.10: I gave the dom0 8GiB of
memory instead of the 3GiB it had before.

I understand that below 4GiB memory use of swiotlb is disabled so
all the time previously this was not used, and now is. Perhaps the
bug is in there?

I was told that the only simple way on a Xen dom0 to disable use of
swiotlb would be to set the memory below 4GiB again, so I might try
that.

I was also pointed to this patch as a maybe fix for my issue:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=5f89468e2f060031cd89fd4287298e0eaf246bf6

Thanks,
Andy


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13
  2021-07-21 14:19   ` Andy Smith
@ 2021-07-21 14:49     ` Jan Beulich
  2021-07-23 20:10       ` Andy Smith
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2021-07-21 14:49 UTC (permalink / raw)
  To: Andy Smith; +Cc: xen-devel

On 21.07.2021 16:19, Andy Smith wrote:
> On Wed, Jul 21, 2021 at 10:10:13AM +0200, Jan Beulich wrote:
>> Since xen-blkback only talks in terms of bio-s, I don't think it is
>> the party responsible for honoring such driver restrictions. Instead
>> I'd expect the block layer's bio merging to be where this needs to be
>> observed. Perhaps it simply doesn't expect to be passed requests in
>> multiples of 11 sectors together with the capping at 64k (as said -
>> iirc) and driver restrictions on where splits may occur? And as to
>> earlier Linux versions working - perhaps the merging logic was less
>> aggressive back then?
> 
> I later realised that there was another change in my setup besides
> the kernel version going from 4.19 to 5.10: I gave the dom0 8GiB of
> memory instead of the 3GiB it had before.
> 
> I understand that below 4GiB memory use of swiotlb is disabled so
> all the time previously this was not used, and now is. Perhaps the
> bug is in there?
> 
> I was told that the only simple way on a Xen dom0 to disable use of
> swiotlb would be to set the memory below 4GiB again, so I might try
> that.

I have no idea where you take this 4GiB aspect from. What the kernel
considers "below 4GiB" in its view of the world may be at a much
higher address in system address space. And the mere amount of
memory doesn't matter here at all.

> I was also pointed to this patch as a maybe fix for my issue:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=5f89468e2f060031cd89fd4287298e0eaf246bf6

Yes, this looks possibly related.

Jan



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13
  2021-07-21 14:49     ` Jan Beulich
@ 2021-07-23 20:10       ` Andy Smith
  2021-07-23 20:37         ` Andy Smith
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Smith @ 2021-07-23 20:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

Hi Jan,

On Wed, Jul 21, 2021 at 04:49:26PM +0200, Jan Beulich wrote:
> On 21.07.2021 16:19, Andy Smith wrote:
> > I understand that below 4GiB memory use of swiotlb is disabled so
> > all the time previously this was not used, and now is. Perhaps the
> > bug is in there?
> > 
> > I was told that the only simple way on a Xen dom0 to disable use of
> > swiotlb would be to set the memory below 4GiB again, so I might try
> > that.
> 
> I have no idea where you take this 4GiB aspect from. What the kernel
> considers "below 4GiB" in its view of the world may be at a much
> higher address in system address space. And the mere amount of
> memory doesn't matter here at all.

Ah, I was taking that from:

    https://elixir.bootlin.com/linux/v5.10.40/source/arch/x86/kernel/pci-swiotlb.c#L41

…which I had found while looking around how to disable use of
swiotlb. But I think I'm probably confused - should I only be
looking at arch/x86/xen/pci-swiotlb-xen.c in the case of a PV domain
like dom0?

I have not been able to reproduce the problem by giving a test
system with identical hardware more RAM and getting fio in a guest
to do random reads with a blocksize between 4kiB and 4MiB.

Perhaps it is highly workload dependent then. In some ways it's a
pity that I do not get call traces for the later occurrences as then
I could see if it's always the same 62.xvda-0 process (and thus same
guest) triggering it.

It's happened three more times since my previous email, but these
have been up to 46 hours apart. These were all reads, so MD just
satisfied the read from the other device without kicking the nvme
device out.

Hmm, I have the sector offset in the MD device so maybe I can
convert that into a logical volume to know if a particular guest is
provoking it…

If anyone happens to have any suggestions as to what kind of IO might
provoke this at all so I could maybe get fio to reproduce it, please
do let me know.

Thanks,
Andy


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13
  2021-07-23 20:10       ` Andy Smith
@ 2021-07-23 20:37         ` Andy Smith
  0 siblings, 0 replies; 7+ messages in thread
From: Andy Smith @ 2021-07-23 20:37 UTC (permalink / raw)
  To: xen-devel

On Fri, Jul 23, 2021 at 08:10:28PM +0000, Andy Smith wrote:
> Hmm, I have the sector offset in the MD device so maybe I can
> convert that into a logical volume to know if a particular guest is
> provoking it…

So for anyone who ever wants to do that sort of thing:

# Find out offset that LVM puts LVs from start of its physical
# device
$ sudo pvs --noheadings -o pe_start --units s /dev/md4
    2048S
# Find out the sector size of each LVM extent (PE)
$ sudo pvdisplay --units s /dev/md4 | grep 'PE Size'
  PE Size               8192 Se
# Report PE number for each sector
$ for sect in 912000815 916064223 1934755601 914360207 1936852857; do lv_sect=$((sect-2048)); pe=$((sect / 8192)); printf "%s: sector %s PE %s\n" $sect $lv_sect $pe; done
912000815: sector 911998767 PE 111328
916064223: sector 916062175 PE 111824
1934755601: sector 1934753553 PE 236176
914360207: sector 914358159 PE 111616
1936852857: sector 1936850809 PE 236432

Looking the PE numbers up in the output of "pvdisplay --maps --units
s /dev/md4" I can see it's three hits for one guest and two for
another.

I will see if I can move the 3 time affected guest to test hardware.

Cheers,
Andy


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13
  2021-07-20 22:32 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13 Andy Smith
  2021-07-21  8:10 ` Jan Beulich
@ 2021-07-25 14:06 ` Andy Smith
  1 sibling, 0 replies; 7+ messages in thread
From: Andy Smith @ 2021-07-25 14:06 UTC (permalink / raw)
  To: xen-devel

Hello,

On Tue, Jul 20, 2021 at 10:32:39PM +0000, Andy Smith wrote:
> I have a Debian 10 (buster/stable) dom0 running hypervisor 4.14.2.
> For almost 2 years it's been using the packaged Debian stable kernel
> which is 4.19.x.
> 
> Last night I upgraded the kernel to the buster-backports package
> which is based on 5.10.40 and about 4 hours later got this:
> 
> Jul 20 02:17:54 lamb kernel: [21061.388607] sg[0] phys_addr:0x00000015eb803000 offset:0 length:4096 dma_address:0x000000209e7b7000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.389775] sg[1] phys_addr:0x00000015eb7bc000 offset:0 length:4096 dma_address:0x000000209e7b8000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.390874] sg[2] phys_addr:0x00000015eb809000 offset:0 length:4096 dma_address:0x000000209e7b9000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.391974] sg[3] phys_addr:0x00000015eb766000 offset:0 length:4096 dma_address:0x000000209e7ba000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.393042] sg[4] phys_addr:0x00000015eb7a3000 offset:0 length:4096 dma_address:0x000000209e7bb000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.394086] sg[5] phys_addr:0x00000015eb7c6000 offset:0 length:4096 dma_address:0x000000209e7bc000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.395078] sg[6] phys_addr:0x00000015eb7c2000 offset:0 length:4096 dma_address:0x000000209e7bd000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.396042] sg[7] phys_addr:0x00000015eb7a9000 offset:0 length:4096 dma_address:0x000000209e7be000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.397004] sg[8] phys_addr:0x00000015eb775000 offset:0 length:4096 dma_address:0x000000209e7bf000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.397971] sg[9] phys_addr:0x00000015eb7c7000 offset:0 length:4096 dma_address:0x00000020ff520000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.398889] sg[10] phys_addr:0x00000015eb7cb000 offset:0 length:4096 dma_address:0x00000020ff521000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.399814] sg[11] phys_addr:0x00000015eb7e3000 offset:0 length:61952 dma_address:0x00000020ff522000 dma_length:61952
> Jul 20 02:17:54 lamb kernel: [21061.400754] sg[12] phys_addr:0x00000015eb7f2200 offset:512 length:24064 dma_address:0x00000020ff531200 dma_length:24064
> Jul 20 02:17:54 lamb kernel: [21061.401781] ------------[ cut here ]------------
> Jul 20 02:17:54 lamb kernel: [21061.402738] Invalid SGL for payload:131072 nents:13
> Jul 20 02:17:54 lamb kernel: [21061.403724] WARNING: CPU: 1 PID: 12669 at drivers/nvme/host/pci.c:716 nvme_map_data+0x7e0/0x820 [nvme]
> Jul 20 02:17:54 lamb kernel: [21061.404728] Modules linked in: binfmt_misc ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpmss nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit nfnetlink_log nfnetlink xt_NFLOG xt_multiport xt_tcpudp ip6table_filter ip6_tables iptable_filter bonding btrfs blake2b_generic dm_snapshot dm_bufio intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm intel_powerclamp crc32_pclmul ghash_clmulni_intel ipmi_ssif aesni_intel libaes crypto_simd cryptd glue_helper snd_hda_intel snd_intel_dspcfg mei_wdt soundwire_intel soundwire_generic_allocation nvme wdat_wdt snd_soc_core ast snd_compress watchdog drm_vram_helper drm_ttm_helper soundwire_cadence pcspkr nvme_core ttm snd_hda_codec drm_kms_helper snd_hda_core i2c_i801 snd_hwdep i2c_smbus cec soundwire_bus snd_pcm drm snd_timer snd soundcore igb ptp pps_core i2c_algo_bit joydev mei_me sg mei intel_lpss_pci intel_lpss idma64 acpi_ipmi ipmi_si ipmi_devintf ioatdma dca wmi ipmi_msghandler button dm_mod xenfs xen_acpi_processor
> Jul 20 02:17:54 lamb kernel: [21061.404831]  xen_privcmd xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid456 libcrc32c crc32c_generic async_raid6_recov async_memcpy async_pq async_xor xor async_tx evdev hid_generic usbhid hid raid6_pq raid0 multipath linear raid10 raid1 md_mod sd_mod t10_pi crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common xhci_pci ahci libahci crc32c_intel xhci_hcd libata usbcore scsi_mod usb_common
> Jul 20 02:17:54 lamb kernel: [21061.417998] CPU: 1 PID: 12669 Comm: 62.xvda-0 Not tainted 5.10.0-0.bpo.7-amd64 #1 Debian 5.10.40-1~bpo10+1
> Jul 20 02:17:54 lamb kernel: [21061.418459] Hardware name: Supermicro Super Server/X11SRM-VF, BIOS 1.2a 02/18/2019
> Jul 20 02:17:54 lamb kernel: [21061.418922] RIP: e030:nvme_map_data+0x7e0/0x820 [nvme]
> Jul 20 02:17:54 lamb kernel: [21061.419354] Code: d0 7b c0 48 c7 c7 40 d6 7b c0 e8 5b 44 c9 c0 8b 93 4c 01 00 00 f6 43 1e 04 75 36 8b 73 28 48 c7 c7 20 9c 7b c0 e8 8b 71 09 c1 <0f> 0b 41 bd 0a 00 00 00 e9 f7 fe ff ff 48 8d bd 68 02 00 00 48 89
> Jul 20 02:17:54 lamb kernel: [21061.420271] RSP: e02b:ffffc90044797930 EFLAGS: 00010286
> Jul 20 02:17:54 lamb kernel: [21061.420727] RAX: 0000000000000000 RBX: ffff888157db4200 RCX: 0000000000000027
> Jul 20 02:17:54 lamb kernel: [21061.421186] RDX: 0000000000000027 RSI: ffff888292858a00 RDI: ffff888292858a08
> Jul 20 02:17:54 lamb kernel: [21061.421639] RBP: ffff888103243000 R08: 0000000000000000 R09: c00000010000118b
> Jul 20 02:17:54 lamb kernel: [21061.422090] R10: 0000000000165920 R11: ffffc90044797738 R12: ffffffffc07b9bd0
> Jul 20 02:17:54 lamb kernel: [21061.422583] R13: 000000000000000d R14: 0000000000000000 R15: 000000000000000d
> Jul 20 02:17:54 lamb kernel: [21061.423052] FS:  0000000000000000(0000) GS:ffff888292840000(0000) knlGS:0000000000000000
> Jul 20 02:17:54 lamb kernel: [21061.423518] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 20 02:17:54 lamb kernel: [21061.423986] CR2: 00007f909a037c30 CR3: 000000010d2dc000 CR4: 0000000000050660
> Jul 20 02:17:54 lamb kernel: [21061.424472] Call Trace:
> Jul 20 02:17:54 lamb kernel: [21061.424943]  nvme_queue_rq+0x98/0x190 [nvme]
> Jul 20 02:17:54 lamb kernel: [21061.425425]  blk_mq_dispatch_rq_list+0x123/0x7d0
> Jul 20 02:17:54 lamb kernel: [21061.425904]  ? sbitmap_get+0x66/0x140
> Jul 20 02:17:54 lamb kernel: [21061.426385]  ? elv_rb_del+0x1f/0x30
> Jul 20 02:17:54 lamb kernel: [21061.426909]  ? deadline_remove_request+0x55/0xc0
> Jul 20 02:17:54 lamb kernel: [21061.427373]  __blk_mq_do_dispatch_sched+0x164/0x2d0
> Jul 20 02:17:54 lamb kernel: [21061.427843]  __blk_mq_sched_dispatch_requests+0x135/0x170
> Jul 20 02:17:54 lamb kernel: [21061.428310]  blk_mq_sched_dispatch_requests+0x30/0x60
> Jul 20 02:17:54 lamb kernel: [21061.428795]  __blk_mq_run_hw_queue+0x51/0xd0
> Jul 20 02:17:54 lamb kernel: [21061.429269]  __blk_mq_delay_run_hw_queue+0x141/0x160
> Jul 20 02:17:54 lamb kernel: [21061.429752]  blk_mq_sched_insert_requests+0x6a/0xf0
> Jul 20 02:17:54 lamb kernel: [21061.430233]  blk_mq_flush_plug_list+0x119/0x1b0
> Jul 20 02:17:54 lamb kernel: [21061.430756]  blk_flush_plug_list+0xd7/0x100
> Jul 20 02:17:54 lamb kernel: [21061.431241]  blk_finish_plug+0x21/0x30
> Jul 20 02:17:54 lamb kernel: [21061.431734]  dispatch_rw_block_io+0x6a5/0x9a0 [xen_blkback]
> Jul 20 02:17:54 lamb kernel: [21061.432220]  __do_block_io_op+0x31d/0x620 [xen_blkback]
> Jul 20 02:17:54 lamb kernel: [21061.432714]  ? _raw_spin_unlock_irqrestore+0x14/0x20
> Jul 20 02:17:54 lamb kernel: [21061.433193]  ? try_to_del_timer_sync+0x4d/0x80
> Jul 20 02:17:54 lamb kernel: [21061.433680]  xen_blkif_schedule+0xda/0x670 [xen_blkback]
> Jul 20 02:17:54 lamb kernel: [21061.434160]  ? __schedule+0x2c6/0x770
> Jul 20 02:17:54 lamb kernel: [21061.434679]  ? finish_wait+0x80/0x80
> Jul 20 02:17:54 lamb kernel: [21061.435129]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
> Jul 20 02:17:54 lamb kernel: [21061.435571]  kthread+0x116/0x130
> Jul 20 02:17:54 lamb kernel: [21061.436002]  ? kthread_park+0x80/0x80
> Jul 20 02:17:54 lamb kernel: [21061.436422]  ret_from_fork+0x22/0x30
> Jul 20 02:17:54 lamb kernel: [21061.436846] ---[ end trace 1d90be7aea2d9148 ]---
> Jul 20 02:17:54 lamb kernel: [21061.437250] blk_update_request: I/O error, dev nvme0n1, sector 912000815 op 0x1:(WRITE) flags 0x800 phys_seg 13 prio class 0
> Jul 20 02:17:54 lamb kernel: [21061.446344] md/raid1:md4: Disk failure on nvme0n1, disabling device.
> Jul 20 02:17:54 lamb kernel: [21061.446344] md/raid1:md4: Operation continuing on 1 devices.

I've tested and can confirm that this issue is fixed by:

    https://lore.kernel.org/patchwork/patch/1442338/

which is in 5.14 already and was also backported to 5.10.50.

Debian buster-backports currently ships with 5.10.40; bullseye is
5.10.46; Salvatore Bonaccorso said on debian-kernel list that they
would backport this fix to all the 5.10 packages.

Thanks,
Andy


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-07-25 14:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-20 22:32 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13 Andy Smith
2021-07-21  8:10 ` Jan Beulich
2021-07-21 14:19   ` Andy Smith
2021-07-21 14:49     ` Jan Beulich
2021-07-23 20:10       ` Andy Smith
2021-07-23 20:37         ` Andy Smith
2021-07-25 14:06 ` Andy Smith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.