All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: bad page map under Xen
@ 2013-10-21 11:57 Lukas Hejtmanek
  2013-10-21 12:59 ` konrad wilk
                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Lukas Hejtmanek @ 2013-10-21 11:57 UTC (permalink / raw)
  To: xen-devel; +Cc: roland, linux-rdma

Hello,

I'm trying to get SR-IOV working under Xen (4.2). It almost works except
memory bug. This is easily reproducible just in Dom0. 

I have Connect-X3 card with the latest firmware. OFED 2.0-3 drivers. I tried
3.2 kernel from Debian, 3.10 kernel from Debian and vanila 3.11.5 kernel. All
are the same. 

As soon as I issue ibv_devinfo command, it produces the following messages
into dmesg. Problem is that with ib_rdma_bw command, I get more of those
messages and moreover, oom killer gets confused and kills almost all
processes.

[23502.645455] mlx4_core 0000:06:00.0: mlx4_ib: Port 1 logical link is up
[23550.181907] <mlx4_ib> check_flow_steering_support: Device managed flow steering is unavailable for IB port in multifunction env.
[23550.183822] swap_free: Unused swap offset entry 00000001
[23550.183868] BUG: Bad page map in process ibv_devinfo  pte:00000200 pmd:1b7df4067
[23550.183939] addr:00007f7ef5e18000 vm_flags:400844fa anon_vma:          (null) mapping:ffff8801b83c0480 index:380fe0882
[23550.184022] vma->vm_file->f_op->mmap: ib_uverbs_mmap+0x0/0x2d [ib_uverbs]
[23550.195382] Pid: 13813, comm: ibv_devinfo Tainted: G           O 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1+zs4
[23550.195461] Call Trace:
[23550.195508]  [<ffffffff810d9009>] ? print_bad_pte+0x1f5/0x20d
[23550.195553]  [<ffffffff810db083>] ? unmap_vmas+0x5fe/0x814
[23550.195601]  [<ffffffff810c68dd>] ? __add_page_to_lru_list+0x53/0x53
[23550.195647]  [<ffffffff810df2de>] ? unmap_region+0x9f/0x102
[23550.195694]  [<ffffffff8100d722>] ? __switch_to+0x23b/0x2b1
[23550.195741]  [<ffffffff8103d870>] ? pick_next_task_fair+0xfc/0x10c
[23550.195788]  [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7
[23550.195832]  [<ffffffff810e01f7>] ? do_munmap+0x281/0x2eb
[23550.195875]  [<ffffffff810e02a0>] ? sys_munmap+0x3f/0x55
[23550.195921]  [<ffffffff8136e51c>] ? system_call_fastpath+0x16/0x1b
[23550.195965] Disabling lock debugging due to kernel taint
[23550.196412] <mlx4_ib> check_flow_steering_support: Device managed flow steering is unavailable for IB port in multifunction env.
[23550.198303] swap_free: Unused swap offset entry 00000001
[23550.198348] BUG: Bad page map in process ibv_devinfo  pte:00000200 pmd:1b7df4067
[23550.198424] addr:00007f7ef5e18000 vm_flags:400844fa anon_vma:          (null) mapping:ffff8801b83c09a0 index:380fe0082
[23550.198508] vma->vm_file->f_op->mmap: ib_uverbs_mmap+0x0/0x2d [ib_uverbs]
[23550.198558] Pid: 13813, comm: ibv_devinfo Tainted: G    B      O 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1+zs4
[23550.198637] Call Trace:
[23550.198680]  [<ffffffff810d9009>] ? print_bad_pte+0x1f5/0x20d
[23550.198730]  [<ffffffff810db083>] ? unmap_vmas+0x5fe/0x814
[23550.198775]  [<ffffffff810c68dd>] ? __add_page_to_lru_list+0x53/0x53
[23550.198820]  [<ffffffff810df2de>] ? unmap_region+0x9f/0x102
[23550.198865]  [<ffffffff8100d6b0>] ? __switch_to+0x1c9/0x2b1
[23550.198913]  [<ffffffff8103d870>] ? pick_next_task_fair+0xfc/0x10c
[23550.198959]  [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7
[23550.199005]  [<ffffffff810e01f7>] ? do_munmap+0x281/0x2eb
[23550.199052]  [<ffffffff810e02a0>] ? sys_munmap+0x3f/0x55
[23550.199096]  [<ffffffff8136e51c>] ? system_call_fastpath+0x16/0x1b
[23550.199766] <mlx4_ib> check_flow_steering_support: Device managed flow steering is unavailable for IB port in multifunction env.
[23550.201661] swap_free: Unused swap offset entry 00000001
[23550.201706] BUG: Bad page map in process ibv_devinfo  pte:00000200 pmd:1b7df4067
[23550.201776] addr:00007f7ef5e18000 vm_flags:400844fa anon_vma:          (null) mapping:ffff8801b83c0ec0 index:380fdf882
[23550.201861] vma->vm_file->f_op->mmap: ib_uverbs_mmap+0x0/0x2d [ib_uverbs]
[23550.201908] Pid: 13813, comm: ibv_devinfo Tainted: G    B      O 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1+zs4
[23550.201990] Call Trace:
[23550.202032]  [<ffffffff810d9009>] ? print_bad_pte+0x1f5/0x20d
[23550.202081]  [<ffffffff810db083>] ? unmap_vmas+0x5fe/0x814
[23550.202125]  [<ffffffff810df2de>] ? unmap_region+0x9f/0x102
[23550.202169]  [<ffffffff8100d6b0>] ? __switch_to+0x1c9/0x2b1
[23550.202217]  [<ffffffff8103d870>] ? pick_next_task_fair+0xfc/0x10c
[23550.202267]  [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7
[23550.202312]  [<ffffffff810e01f7>] ? do_munmap+0x281/0x2eb
[23550.202355]  [<ffffffff810e02a0>] ? sys_munmap+0x3f/0x55
[23550.202398]  [<ffffffff8136e51c>] ? system_call_fastpath+0x16/0x1b
[23550.202925] <mlx4_ib> check_flow_steering_support: Device managed flow steering is unavailable for IB port in multifunction env.
[23550.213336] swap_free: Unused swap offset entry 00000001
[23550.213377] BUG: Bad page map in process ibv_devinfo  pte:00000200 pmd:1b7df4067
[23550.213448] addr:00007f7ef5e18000 vm_flags:400844fa anon_vma:          (null) mapping:ffff8801b6bd8ec0 index:380fdf082
[23550.213527] vma->vm_file->f_op->mmap: ib_uverbs_mmap+0x0/0x2d [ib_uverbs]
[23550.213573] Pid: 13813, comm: ibv_devinfo Tainted: G    B      O 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1+zs4
[23550.213651] Call Trace:
[23550.213775]  [<ffffffff810d9009>] ? print_bad_pte+0x1f5/0x20d
[23550.213820]  [<ffffffff810db083>] ? unmap_vmas+0x5fe/0x814
[23550.213863]  [<ffffffff810df2de>] ? unmap_region+0x9f/0x102
[23550.213907]  [<ffffffff8100d6b0>] ? __switch_to+0x1c9/0x2b1
[23550.213951]  [<ffffffff8103d870>] ? pick_next_task_fair+0xfc/0x10c
[23550.213996]  [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7
[23550.214041]  [<ffffffff810e01f7>] ? do_munmap+0x281/0x2eb
[23550.214084]  [<ffffffff810e02a0>] ? sys_munmap+0x3f/0x55
[23550.214127]  [<ffffffff8136e51c>] ? system_call_fastpath+0x16/0x1b
[23550.214461] <mlx4_ib> check_flow_steering_support: Device managed flow steering is unavailable for IB port in multifunction env.
[23550.215924] swap_free: Unused swap offset entry 00000001
[23550.215974] BUG: Bad page map in process ibv_devinfo  pte:00000200 pmd:1b7df4067
[23550.216049] addr:00007f7ef5e18000 vm_flags:400844fa anon_vma:          (null) mapping:ffff8801b8f381f0 index:380fff085
[23550.216133] vma->vm_file->f_op->mmap: ib_uverbs_mmap+0x0/0x2d [ib_uverbs]
[23550.216184] Pid: 13813, comm: ibv_devinfo Tainted: G    B      O 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1+zs4
[23550.216267] Call Trace:
[23550.216306]  [<ffffffff810d9009>] ? print_bad_pte+0x1f5/0x20d
[23550.216351]  [<ffffffff810db083>] ? unmap_vmas+0x5fe/0x814
[23550.216395]  [<ffffffff810df2de>] ? unmap_region+0x9f/0x102
[23550.216443]  [<ffffffff8100d6b0>] ? __switch_to+0x1c9/0x2b1
[23550.216487]  [<ffffffff8103d870>] ? pick_next_task_fair+0xfc/0x10c
[23550.216532]  [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7
[23550.216581]  [<ffffffff810e01f7>] ? do_munmap+0x281/0x2eb
[23550.216628]  [<ffffffff810e02a0>] ? sys_munmap+0x3f/0x55
[23550.216677]  [<ffffffff8136e51c>] ? system_call_fastpath+0x16/0x1b
[23550.216728] swap_free: Unused swap offset entry 00000001
[23550.216777] BUG: Bad page map in process ibv_devinfo  pte:00000200 pmd:1b7df4067
[23550.216846] addr:00007f7ef5e16000 vm_flags:400844fa anon_vma:          (null) mapping:ffff8801b8f381f0 index:380fff485
[23550.216925] vma->vm_file->f_op->mmap: ib_uverbs_mmap+0x0/0x2d [ib_uverbs]
[23550.216980] Pid: 13813, comm: ibv_devinfo Tainted: G    B      O 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1+zs4
[23550.217077] Call Trace:
[23550.217124]  [<ffffffff810d9009>] ? print_bad_pte+0x1f5/0x20d
[23550.217169]  [<ffffffff810db083>] ? unmap_vmas+0x5fe/0x814
[23550.217212]  [<ffffffff810df2de>] ? unmap_region+0x9f/0x102
[23550.217256]  [<ffffffff8100d6b0>] ? __switch_to+0x1c9/0x2b1
[23550.217300]  [<ffffffff8103d870>] ? pick_next_task_fair+0xfc/0x10c
[23550.217349]  [<ffffffff810e01f7>] ? do_munmap+0x281/0x2eb
[23550.217396]  [<ffffffff810e02a0>] ? sys_munmap+0x3f/0x55
[23550.217443]  [<ffffffff8136e51c>] ? system_call_fastpath+0x16/0x1b

this happens only if running under Xen. Native kernel in the same version is OK.

Is it a known bug or is something wrong with BIOS/firmware?

-- 
Lukáš Hejtmánek

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found] ` <20131021115740.GN20913-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
@ 2013-10-21 12:59   ` konrad wilk
       [not found]     ` <52652534.2040303-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2013-10-21 13:18     ` Jan Beulich
  2013-10-21 13:14   ` [Xen-devel] " Jan Beulich
  1 sibling, 2 replies; 40+ messages in thread
From: konrad wilk @ 2013-10-21 12:59 UTC (permalink / raw)
  To: Lukas Hejtmanek
  Cc: xen-devel-GuqFBffKawuEi8DpZVb4nw, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA


On 10/21/2013 7:57 AM, Lukas Hejtmanek wrote:
> Hello,
>
> I'm trying to get SR-IOV working under Xen (4.2). It almost works except
> memory bug. This is easily reproducible just in Dom0.
>
> I have Connect-X3 card with the latest firmware. OFED 2.0-3 drivers. I tried
> 3.2 kernel from Debian, 3.10 kernel from Debian and vanila 3.11.5 kernel. All
> are the same.

Ha! Funny you mention that. I had been looking at this.
> As soon as I issue ibv_devinfo command, it produces the following messages
> into dmesg. Problem is that with ib_rdma_bw command, I get more of those
> messages and moreover, oom killer gets confused and kills almost all
> processes.
>
> [23502.645455] mlx4_core 0000:06:00.0: mlx4_ib: Port 1 logical link is up
> [23550.181907] <mlx4_ib> check_flow_steering_support: Device managed flow steering is unavailable for IB port in multifunction env.
> [23550.183822] swap_free: Unused swap offset entry 00000001
> [23550.183868] BUG: Bad page map in process ibv_devinfo  pte:00000200 pmd:1b7df4067
> [23550.183939] addr:00007f7ef5e18000 vm_flags:400844fa anon_vma:          (null) mapping:ffff8801b83c0480 index:380fe0882
> [23550.184022] vma->vm_file->f_op->mmap: ib_uverbs_mmap+0x0/0x2d [ib_uverbs]
> [23550.195382] Pid: 13813, comm: ibv_devinfo Tainted: G           O 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1+zs4
> [23550.195461] Call Trace:
> [23550.195508]  [<ffffffff810d9009>] ? print_bad_pte+0x1f5/0x20d
> [23550.195553]  [<ffffffff810db083>] ? unmap_vmas+0x5fe/0x814
> [23550.195601]  [<ffffffff810c68dd>] ? __add_page_to_lru_list+0x53/0x53
> [23550.195647]  [<ffffffff810df2de>] ? unmap_region+0x9f/0x102
> [23550.195694]  [<ffffffff8100d722>] ? __switch_to+0x23b/0x2b1
> [23550.195741]  [<ffffffff8103d870>] ? pick_next_task_fair+0xfc/0x10c
> [23550.195788]  [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7
> [23550.195832]  [<ffffffff810e01f7>] ? do_munmap+0x281/0x2eb
> [23550.195875]  [<ffffffff810e02a0>] ? sys_munmap+0x3f/0x55
> [23550.195921]  [<ffffffff8136e51c>] ? system_call_fastpath+0x16/0x1b
> [23550.195965] Disabling lock debugging due to kernel taint
> [23550.196412] <mlx4_ib> check_flow_steering_support: Device managed flow steering is unavailable for IB port in multifunction env.
> [23550.198303] swap_free: Unused swap offset entry 00000001
> [23550.198348] BUG: Bad page map in process ibv_devinfo  pte:00000200 pmd:1b7df4067
> [23550.198424] addr:00007f7ef5e18000 vm_flags:400844fa anon_vma:          (null) mapping:ffff8801b83c09a0 index:380fe0082
..
> this happens only if running under Xen. Native kernel in the same version is OK.
>
> Is it a known bug or is something wrong with BIOS/firmware?
>
It is a bug in the drivers I believe. The issue is that the mapping 
created for the second mmap
call is done without VM_IO and on an PFN that is RAM (and not the BAR). 
But I am not entirely
sure and hopefully this week will have a better idea and fix. Stay tuned.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 11:57 BUG: bad page map under Xen Lukas Hejtmanek
@ 2013-10-21 12:59 ` konrad wilk
  2013-10-21 13:14 ` Jan Beulich
       [not found] ` <20131021115740.GN20913-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
  2 siblings, 0 replies; 40+ messages in thread
From: konrad wilk @ 2013-10-21 12:59 UTC (permalink / raw)
  To: Lukas Hejtmanek; +Cc: roland, linux-rdma, xen-devel


On 10/21/2013 7:57 AM, Lukas Hejtmanek wrote:
> Hello,
>
> I'm trying to get SR-IOV working under Xen (4.2). It almost works except
> memory bug. This is easily reproducible just in Dom0.
>
> I have Connect-X3 card with the latest firmware. OFED 2.0-3 drivers. I tried
> 3.2 kernel from Debian, 3.10 kernel from Debian and vanila 3.11.5 kernel. All
> are the same.

Ha! Funny you mention that. I had been looking at this.
> As soon as I issue ibv_devinfo command, it produces the following messages
> into dmesg. Problem is that with ib_rdma_bw command, I get more of those
> messages and moreover, oom killer gets confused and kills almost all
> processes.
>
> [23502.645455] mlx4_core 0000:06:00.0: mlx4_ib: Port 1 logical link is up
> [23550.181907] <mlx4_ib> check_flow_steering_support: Device managed flow steering is unavailable for IB port in multifunction env.
> [23550.183822] swap_free: Unused swap offset entry 00000001
> [23550.183868] BUG: Bad page map in process ibv_devinfo  pte:00000200 pmd:1b7df4067
> [23550.183939] addr:00007f7ef5e18000 vm_flags:400844fa anon_vma:          (null) mapping:ffff8801b83c0480 index:380fe0882
> [23550.184022] vma->vm_file->f_op->mmap: ib_uverbs_mmap+0x0/0x2d [ib_uverbs]
> [23550.195382] Pid: 13813, comm: ibv_devinfo Tainted: G           O 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1+zs4
> [23550.195461] Call Trace:
> [23550.195508]  [<ffffffff810d9009>] ? print_bad_pte+0x1f5/0x20d
> [23550.195553]  [<ffffffff810db083>] ? unmap_vmas+0x5fe/0x814
> [23550.195601]  [<ffffffff810c68dd>] ? __add_page_to_lru_list+0x53/0x53
> [23550.195647]  [<ffffffff810df2de>] ? unmap_region+0x9f/0x102
> [23550.195694]  [<ffffffff8100d722>] ? __switch_to+0x23b/0x2b1
> [23550.195741]  [<ffffffff8103d870>] ? pick_next_task_fair+0xfc/0x10c
> [23550.195788]  [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7
> [23550.195832]  [<ffffffff810e01f7>] ? do_munmap+0x281/0x2eb
> [23550.195875]  [<ffffffff810e02a0>] ? sys_munmap+0x3f/0x55
> [23550.195921]  [<ffffffff8136e51c>] ? system_call_fastpath+0x16/0x1b
> [23550.195965] Disabling lock debugging due to kernel taint
> [23550.196412] <mlx4_ib> check_flow_steering_support: Device managed flow steering is unavailable for IB port in multifunction env.
> [23550.198303] swap_free: Unused swap offset entry 00000001
> [23550.198348] BUG: Bad page map in process ibv_devinfo  pte:00000200 pmd:1b7df4067
> [23550.198424] addr:00007f7ef5e18000 vm_flags:400844fa anon_vma:          (null) mapping:ffff8801b83c09a0 index:380fe0082
..
> this happens only if running under Xen. Native kernel in the same version is OK.
>
> Is it a known bug or is something wrong with BIOS/firmware?
>
It is a bug in the drivers I believe. The issue is that the mapping 
created for the second mmap
call is done without VM_IO and on an PFN that is RAM (and not the BAR). 
But I am not entirely
sure and hopefully this week will have a better idea and fix. Stay tuned.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found] ` <20131021115740.GN20913-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
  2013-10-21 12:59   ` [Xen-devel] " konrad wilk
@ 2013-10-21 13:14   ` Jan Beulich
  1 sibling, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-21 13:14 UTC (permalink / raw)
  To: Lukas Hejtmanek
  Cc: David Vrabel, roland-DgEjT+Ai2ygdnm+yROfE0A, xen-devel,
	Boris Ostrovsky, Konrad Rzeszutek Wilk,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

>>> On 21.10.13 at 13:57, Lukas Hejtmanek <xhejtman-8qz54MUs51PtwjQa/ONI9g@public.gmane.org> wrote:
> I'm trying to get SR-IOV working under Xen (4.2). It almost works except
> memory bug. This is easily reproducible just in Dom0. 

So without any SR-IOV then, I suppose?

> [23502.645455] mlx4_core 0000:06:00.0: mlx4_ib: Port 1 logical link is up
> [23550.181907] <mlx4_ib> check_flow_steering_support: Device managed flow 
> steering is unavailable for IB port in multifunction env.
> [23550.183822] swap_free: Unused swap offset entry 00000001
> [23550.183868] BUG: Bad page map in process ibv_devinfo  pte:00000200 
> pmd:1b7df4067
> [23550.183939] addr:00007f7ef5e18000 vm_flags:400844fa anon_vma:          
> (null) mapping:ffff8801b83c0480 index:380fe0882
> [23550.184022] vma->vm_file->f_op->mmap: ib_uverbs_mmap+0x0/0x2d [ib_uverbs]

Looking at ib_uverbs_mmap() and its necessary (for mlx4)
descendant mlx4_ib_mmap() I see that the latter calls
io_remap_pfn_range(), but afaict there's nowhere _PAGE_IOMAP
would get set here (as opposed to
arch/x86/pci/i386.c:pci_mmap_page_range() for example). Could
you check whether adding that flag helps? (I'm copying the kernel
maintainers so that they could correct me if I'm wrong here - it
would seem to me that this could equally be the reason for why
there are other reports of certain things not working as expected
in domains with more than 4Gb.)

You could also consider trying an openSUSE kernel - there, other
than upstream, there's no need for each and every caller of
io_remap_pfn_range() to take care of setting _PAGE_IOMAP (and
I vaguely recall having discussed this a couple of years back with
Konrad et al).

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 11:57 BUG: bad page map under Xen Lukas Hejtmanek
  2013-10-21 12:59 ` konrad wilk
@ 2013-10-21 13:14 ` Jan Beulich
       [not found] ` <20131021115740.GN20913-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
  2 siblings, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-21 13:14 UTC (permalink / raw)
  To: Lukas Hejtmanek
  Cc: roland, linux-rdma, David Vrabel, xen-devel, Boris Ostrovsky

>>> On 21.10.13 at 13:57, Lukas Hejtmanek <xhejtman@ics.muni.cz> wrote:
> I'm trying to get SR-IOV working under Xen (4.2). It almost works except
> memory bug. This is easily reproducible just in Dom0. 

So without any SR-IOV then, I suppose?

> [23502.645455] mlx4_core 0000:06:00.0: mlx4_ib: Port 1 logical link is up
> [23550.181907] <mlx4_ib> check_flow_steering_support: Device managed flow 
> steering is unavailable for IB port in multifunction env.
> [23550.183822] swap_free: Unused swap offset entry 00000001
> [23550.183868] BUG: Bad page map in process ibv_devinfo  pte:00000200 
> pmd:1b7df4067
> [23550.183939] addr:00007f7ef5e18000 vm_flags:400844fa anon_vma:          
> (null) mapping:ffff8801b83c0480 index:380fe0882
> [23550.184022] vma->vm_file->f_op->mmap: ib_uverbs_mmap+0x0/0x2d [ib_uverbs]

Looking at ib_uverbs_mmap() and its necessary (for mlx4)
descendant mlx4_ib_mmap() I see that the latter calls
io_remap_pfn_range(), but afaict there's nowhere _PAGE_IOMAP
would get set here (as opposed to
arch/x86/pci/i386.c:pci_mmap_page_range() for example). Could
you check whether adding that flag helps? (I'm copying the kernel
maintainers so that they could correct me if I'm wrong here - it
would seem to me that this could equally be the reason for why
there are other reports of certain things not working as expected
in domains with more than 4Gb.)

You could also consider trying an openSUSE kernel - there, other
than upstream, there's no need for each and every caller of
io_remap_pfn_range() to take care of setting _PAGE_IOMAP (and
I vaguely recall having discussed this a couple of years back with
Konrad et al).

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]     ` <52652534.2040303-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2013-10-21 13:18       ` Jan Beulich
       [not found]         ` <526545E002000078000FC5F1-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
  2013-10-21 13:39         ` konrad wilk
  0 siblings, 2 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-21 13:18 UTC (permalink / raw)
  To: konrad wilk
  Cc: Lukas Hejtmanek, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

>>> On 21.10.13 at 14:59, konrad wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> It is a bug in the drivers I believe. The issue is that the mapping 
> created for the second mmap
> call is done without VM_IO and on an PFN that is RAM (and not the BAR). 

So while putting together the reply that I had sent to Lukas a
minute ago I was actually hunting for that VM_IO -> _PAGE_IOMAP
translation, and wasn't able to find it anywhere. As you say it
nevertheless exists - what am I overlooking (and why would then
pci_mmap_page_range() nevertheless have to set _PAGE_IOMAP
by hand)?

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 12:59   ` [Xen-devel] " konrad wilk
       [not found]     ` <52652534.2040303-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2013-10-21 13:18     ` Jan Beulich
  1 sibling, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-21 13:18 UTC (permalink / raw)
  To: konrad wilk; +Cc: roland, linux-rdma, Lukas Hejtmanek, xen-devel

>>> On 21.10.13 at 14:59, konrad wilk <konrad.wilk@oracle.com> wrote:
> It is a bug in the drivers I believe. The issue is that the mapping 
> created for the second mmap
> call is done without VM_IO and on an PFN that is RAM (and not the BAR). 

So while putting together the reply that I had sent to Lukas a
minute ago I was actually hunting for that VM_IO -> _PAGE_IOMAP
translation, and wasn't able to find it anywhere. As you say it
nevertheless exists - what am I overlooking (and why would then
pci_mmap_page_range() nevertheless have to set _PAGE_IOMAP
by hand)?

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]         ` <526545E002000078000FC5F1-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
@ 2013-10-21 13:39           ` konrad wilk
       [not found]             ` <52652E95.3020305-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
                               ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: konrad wilk @ 2013-10-21 13:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Lukas Hejtmanek, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA


On 10/21/2013 9:18 AM, Jan Beulich wrote:
>>>> On 21.10.13 at 14:59, konrad wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>> It is a bug in the drivers I believe. The issue is that the mapping
>> created for the second mmap
>> call is done without VM_IO and on an PFN that is RAM (and not the BAR).
> So while putting together the reply that I had sent to Lukas a
> minute ago I was actually hunting for that VM_IO -> _PAGE_IOMAP
> translation, and wasn't able to find it anywhere. As you say it
> nevertheless exists - what am I overlooking (and why would then
> pci_mmap_page_range() nevertheless have to set _PAGE_IOMAP
> by hand)?

The P2M (arch/x86/xen/p2m.c) is consulted which for the MMIO gaps and
E820_RESV has the MFNs set to the PFN. This is the 1-1 pfn/mfn stuff 
that I implemented
some time ago - as hpa was opposed to having the _PAGE_IOMAP being stuck 
on any macro
call to pgprot_writecombine|noncached|etc. Or perhaps that was on the 
arch_something_prot.

Anyhow, the odd thing is that looking at the code:

  669                 if (io_remap_pfn_range(vma, vma->vm_start,
  670 to_mucontext(context)->uar.pfn +
  671 dev->dev->caps.num_uars,
  672                                        PAGE_SIZE, vma->vm_page_prot))

The PFN in question (uar.pfn) is in mlx4_uar_alloc is set to:

159         uar->pfn = (pci_resource_start(dev->pdev, 2) >> PAGE_SHIFT) 
+ offset;

So is the BAR not in the MMIO region? Or is it the 64-bit type MMIO that 
lays outside the 4GB and
hence when the P2M is consulted it thinks its INVALID_P2M_ENTRY?

Which comes back to the bug you (Jan) discovered when you pointed out 
that PVH needs to setup MMIO entries
for 64-bit MMIO regions which can be outside the 4GB region <sigh>. And 
that is something the pvops kernel
completly ignores as it assumes that any region past the E820 can be 
used for ballooning.

Anyhow, one easy thing to figure out is to get the lspci -v output from 
the InfiniBand card
to see where its BARs are, and also the start of the kernel. You should 
see an E820 map (please also boot with
"debug" on the Linux command line).

> Jan
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 13:18       ` Jan Beulich
       [not found]         ` <526545E002000078000FC5F1-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
@ 2013-10-21 13:39         ` konrad wilk
  1 sibling, 0 replies; 40+ messages in thread
From: konrad wilk @ 2013-10-21 13:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: roland, linux-rdma, Lukas Hejtmanek, xen-devel


On 10/21/2013 9:18 AM, Jan Beulich wrote:
>>>> On 21.10.13 at 14:59, konrad wilk <konrad.wilk@oracle.com> wrote:
>> It is a bug in the drivers I believe. The issue is that the mapping
>> created for the second mmap
>> call is done without VM_IO and on an PFN that is RAM (and not the BAR).
> So while putting together the reply that I had sent to Lukas a
> minute ago I was actually hunting for that VM_IO -> _PAGE_IOMAP
> translation, and wasn't able to find it anywhere. As you say it
> nevertheless exists - what am I overlooking (and why would then
> pci_mmap_page_range() nevertheless have to set _PAGE_IOMAP
> by hand)?

The P2M (arch/x86/xen/p2m.c) is consulted which for the MMIO gaps and
E820_RESV has the MFNs set to the PFN. This is the 1-1 pfn/mfn stuff 
that I implemented
some time ago - as hpa was opposed to having the _PAGE_IOMAP being stuck 
on any macro
call to pgprot_writecombine|noncached|etc. Or perhaps that was on the 
arch_something_prot.

Anyhow, the odd thing is that looking at the code:

  669                 if (io_remap_pfn_range(vma, vma->vm_start,
  670 to_mucontext(context)->uar.pfn +
  671 dev->dev->caps.num_uars,
  672                                        PAGE_SIZE, vma->vm_page_prot))

The PFN in question (uar.pfn) is in mlx4_uar_alloc is set to:

159         uar->pfn = (pci_resource_start(dev->pdev, 2) >> PAGE_SHIFT) 
+ offset;

So is the BAR not in the MMIO region? Or is it the 64-bit type MMIO that 
lays outside the 4GB and
hence when the P2M is consulted it thinks its INVALID_P2M_ENTRY?

Which comes back to the bug you (Jan) discovered when you pointed out 
that PVH needs to setup MMIO entries
for 64-bit MMIO regions which can be outside the 4GB region <sigh>. And 
that is something the pvops kernel
completly ignores as it assumes that any region past the E820 can be 
used for ballooning.

Anyhow, one easy thing to figure out is to get the lspci -v output from 
the InfiniBand card
to see where its BARs are, and also the start of the kernel. You should 
see an E820 map (please also boot with
"debug" on the Linux command line).

> Jan
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]             ` <52652E95.3020305-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2013-10-21 13:57               ` konrad wilk
  2013-10-21 14:06               ` Lukas Hejtmanek
  1 sibling, 0 replies; 40+ messages in thread
From: konrad wilk @ 2013-10-21 13:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Lukas Hejtmanek, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA


On 10/21/2013 9:39 AM, konrad wilk wrote:
>
> On 10/21/2013 9:18 AM, Jan Beulich wrote:
>>>>> On 21.10.13 at 14:59, konrad wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>> It is a bug in the drivers I believe. The issue is that the mapping
>>> created for the second mmap
>>> call is done without VM_IO and on an PFN that is RAM (and not the BAR).
>> So while putting together the reply that I had sent to Lukas a
>> minute ago I was actually hunting for that VM_IO -> _PAGE_IOMAP
>> translation, and wasn't able to find it anywhere. As you say it
>> nevertheless exists - what am I overlooking (and why would then
>> pci_mmap_page_range() nevertheless have to set _PAGE_IOMAP
>> by hand)?
>
> The P2M (arch/x86/xen/p2m.c) is consulted which for the MMIO gaps and
> E820_RESV has the MFNs set to the PFN. This is the 1-1 pfn/mfn stuff 
> that I implemented
> some time ago - as hpa was opposed to having the _PAGE_IOMAP being 
> stuck on any macro
> call to pgprot_writecombine|noncached|etc. Or perhaps that was on the 
> arch_something_prot.

This is the one that Jeremy cooked up some time ago:
http://lkml.indiana.edu/hypermail/linux/kernel/1010.2/03012.html

And here was the thread:
http://www.spinics.net/lists/linux-rdma/msg07085.html

which I thought had been fixed by the P2M identity code.
>
> Anyhow, the odd thing is that looking at the code:
>
>  669                 if (io_remap_pfn_range(vma, vma->vm_start,
>  670 to_mucontext(context)->uar.pfn +
>  671 dev->dev->caps.num_uars,
>  672                                        PAGE_SIZE, 
> vma->vm_page_prot))
>
> The PFN in question (uar.pfn) is in mlx4_uar_alloc is set to:
>
> 159         uar->pfn = (pci_resource_start(dev->pdev, 2) >> 
> PAGE_SHIFT) + offset;
>
> So is the BAR not in the MMIO region? Or is it the 64-bit type MMIO 
> that lays outside the 4GB and
> hence when the P2M is consulted it thinks its INVALID_P2M_ENTRY?
>
> Which comes back to the bug you (Jan) discovered when you pointed out 
> that PVH needs to setup MMIO entries
> for 64-bit MMIO regions which can be outside the 4GB region <sigh>. 
> And that is something the pvops kernel
> completly ignores as it assumes that any region past the E820 can be 
> used for ballooning.
>
> Anyhow, one easy thing to figure out is to get the lspci -v output 
> from the InfiniBand card
> to see where its BARs are, and also the start of the kernel. You 
> should see an E820 map (please also boot with
> "debug" on the Linux command line).
>
>> Jan
>>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 13:39           ` konrad wilk
       [not found]             ` <52652E95.3020305-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2013-10-21 13:57             ` konrad wilk
  2013-10-21 14:06             ` Lukas Hejtmanek
  2 siblings, 0 replies; 40+ messages in thread
From: konrad wilk @ 2013-10-21 13:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: roland, linux-rdma, Lukas Hejtmanek, xen-devel


On 10/21/2013 9:39 AM, konrad wilk wrote:
>
> On 10/21/2013 9:18 AM, Jan Beulich wrote:
>>>>> On 21.10.13 at 14:59, konrad wilk <konrad.wilk@oracle.com> wrote:
>>> It is a bug in the drivers I believe. The issue is that the mapping
>>> created for the second mmap
>>> call is done without VM_IO and on an PFN that is RAM (and not the BAR).
>> So while putting together the reply that I had sent to Lukas a
>> minute ago I was actually hunting for that VM_IO -> _PAGE_IOMAP
>> translation, and wasn't able to find it anywhere. As you say it
>> nevertheless exists - what am I overlooking (and why would then
>> pci_mmap_page_range() nevertheless have to set _PAGE_IOMAP
>> by hand)?
>
> The P2M (arch/x86/xen/p2m.c) is consulted which for the MMIO gaps and
> E820_RESV has the MFNs set to the PFN. This is the 1-1 pfn/mfn stuff 
> that I implemented
> some time ago - as hpa was opposed to having the _PAGE_IOMAP being 
> stuck on any macro
> call to pgprot_writecombine|noncached|etc. Or perhaps that was on the 
> arch_something_prot.

This is the one that Jeremy cooked up some time ago:
http://lkml.indiana.edu/hypermail/linux/kernel/1010.2/03012.html

And here was the thread:
http://www.spinics.net/lists/linux-rdma/msg07085.html

which I thought had been fixed by the P2M identity code.
>
> Anyhow, the odd thing is that looking at the code:
>
>  669                 if (io_remap_pfn_range(vma, vma->vm_start,
>  670 to_mucontext(context)->uar.pfn +
>  671 dev->dev->caps.num_uars,
>  672                                        PAGE_SIZE, 
> vma->vm_page_prot))
>
> The PFN in question (uar.pfn) is in mlx4_uar_alloc is set to:
>
> 159         uar->pfn = (pci_resource_start(dev->pdev, 2) >> 
> PAGE_SHIFT) + offset;
>
> So is the BAR not in the MMIO region? Or is it the 64-bit type MMIO 
> that lays outside the 4GB and
> hence when the P2M is consulted it thinks its INVALID_P2M_ENTRY?
>
> Which comes back to the bug you (Jan) discovered when you pointed out 
> that PVH needs to setup MMIO entries
> for 64-bit MMIO regions which can be outside the 4GB region <sigh>. 
> And that is something the pvops kernel
> completly ignores as it assumes that any region past the E820 can be 
> used for ballooning.
>
> Anyhow, one easy thing to figure out is to get the lspci -v output 
> from the InfiniBand card
> to see where its BARs are, and also the start of the kernel. You 
> should see an E820 map (please also boot with
> "debug" on the Linux command line).
>
>> Jan
>>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]             ` <52652E95.3020305-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2013-10-21 13:57               ` konrad wilk
@ 2013-10-21 14:06               ` Lukas Hejtmanek
  2013-10-21 14:18                 ` Konrad Rzeszutek Wilk
                                   ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Lukas Hejtmanek @ 2013-10-21 14:06 UTC (permalink / raw)
  To: konrad wilk
  Cc: Jan Beulich, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Oct 21, 2013 at 09:39:33AM -0400, konrad wilk wrote:
> Anyhow, one easy thing to figure out is to get the lspci -v output
> from the InfiniBand card
> to see where its BARs are, and also the start of the kernel. You
> should see an E820 map (please also boot with
> "debug" on the Linux command line).

note, adding _PAGE_IO as Jan suggested fixed those mem errors.

here is lspci from the card and its virtual functions.

06:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
        Subsystem: Mellanox Technologies Device 0017
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 42
        Region 0: Memory at dfa00000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at df900000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] Vital Product Data
                Product Name: CX353A - ConnectX-3 QSFP
                Read-only fields:
                        [PN] Part number: MCX353A-QCBT         
                        [EC] Engineering changes: A4
                        [SN] Serial number: MT1327X00814            
                        [V0] Vendor specific: PCIe Gen3 x8    
                        [RV] Reserved: checksum good, 0 byte(s) reserved
                Read/write fields:
                        [V1] Vendor specific: N/A   
                        [YA] Asset tag: N/A                     
                        [RW] Read-write area: 105 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 252 byte(s) free
                End
        Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
                Vector table: BAR=0 offset=0007c000
                PBA: BAR=0 offset=0007d000
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [148 v1] Device Serial Number 00-02-c9-03-00-b6-fc-70
        Capabilities: [108 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
                IOVSta: Migration-
                Initial VFs: 64, Total VFs: 64, Number of VFs: 4, Function Dependency Link: 00
                VF offset: 1, stride: 1, Device ID: 1004
                Supported Page Size: 000007ff, System Page Size: 00000001
                Region 2: Memory at 0000380fdf000000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [154 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [18c v1] #19
        Kernel driver in use: mlx4_core

06:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]
        Subsystem: Mellanox Technologies Device 61b0
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 2: [virtual] Memory at 380fdf000000 (64-bit, prefetchable) [size=8M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=2 offset=00002000
                PBA: BAR=2 offset=00003000
        Kernel driver in use: mlx4_core

06:00.2 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]
        Subsystem: Mellanox Technologies Device 61b0
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 2: [virtual] Memory at 380fdf800000 (64-bit, prefetchable) [size=8M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=2 offset=00002000
                PBA: BAR=2 offset=00003000
        Kernel driver in use: mlx4_core

06:00.3 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]
        Subsystem: Mellanox Technologies Device 61b0
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 2: [virtual] Memory at 380fe0000000 (64-bit, prefetchable) [size=8M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=2 offset=00002000
                PBA: BAR=2 offset=00003000
        Kernel driver in use: mlx4_core

06:00.4 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]
        Subsystem: Mellanox Technologies Device 61b0
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 2: [virtual] Memory at 380fe0800000 (64-bit, prefetchable) [size=8M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=2 offset=00002000
                PBA: BAR=2 offset=00003000
        Kernel driver in use: mlx4_core

and this is from dmesg:

[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] Xen: [mem 0x0000000000000000-0x0000000000090fff] usable
[    0.000000] Xen: [mem 0x0000000000091800-0x00000000000fffff] reserved
[    0.000000] Xen: [mem 0x0000000000100000-0x000000007dd76fff] usable
[    0.000000] Xen: [mem 0x000000007dd77000-0x000000007ddb5fff] reserved
[    0.000000] Xen: [mem 0x000000007ddb6000-0x000000007debefff] ACPI data
[    0.000000] Xen: [mem 0x000000007debf000-0x000000007e0dafff] ACPI NVS
[    0.000000] Xen: [mem 0x000000007e0db000-0x000000007f357fff] reserved
[    0.000000] Xen: [mem 0x000000007f358000-0x000000007f7fffff] ACPI NVS
[    0.000000] Xen: [mem 0x0000000080000000-0x000000008fffffff] reserved
[    0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec01fff] reserved
[    0.000000] Xen: [mem 0x00000000fec40000-0x00000000fec40fff] reserved
[    0.000000] Xen: [mem 0x00000000fed1c000-0x00000000fed3ffff] reserved
[    0.000000] Xen: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] Xen: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] Xen: [mem 0x0000000100000000-0x000000107fffffff] usable

[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] e820: last_pfn = 0x1080000 max_arch_pfn = 0x400000000
[    0.000000] e820: last_pfn = 0x7dd77 max_arch_pfn = 0x400000000
[    0.000000] e820: [mem 0x90000000-0xfebfffff] available for PCI devices
[   23.917733] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
[   24.587366] e820: reserve RAM buffer [mem 0x00091000-0x0009ffff]
[   24.587468] e820: reserve RAM buffer [mem 0x7dd77000-0x7fffffff]

do you need anything else?

-- 
Lukáš Hejtmánek
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 13:39           ` konrad wilk
       [not found]             ` <52652E95.3020305-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2013-10-21 13:57             ` konrad wilk
@ 2013-10-21 14:06             ` Lukas Hejtmanek
  2 siblings, 0 replies; 40+ messages in thread
From: Lukas Hejtmanek @ 2013-10-21 14:06 UTC (permalink / raw)
  To: konrad wilk; +Cc: roland, linux-rdma, Jan Beulich, xen-devel

On Mon, Oct 21, 2013 at 09:39:33AM -0400, konrad wilk wrote:
> Anyhow, one easy thing to figure out is to get the lspci -v output
> from the InfiniBand card
> to see where its BARs are, and also the start of the kernel. You
> should see an E820 map (please also boot with
> "debug" on the Linux command line).

note, adding _PAGE_IO as Jan suggested fixed those mem errors.

here is lspci from the card and its virtual functions.

06:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
        Subsystem: Mellanox Technologies Device 0017
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 42
        Region 0: Memory at dfa00000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at df900000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] Vital Product Data
                Product Name: CX353A - ConnectX-3 QSFP
                Read-only fields:
                        [PN] Part number: MCX353A-QCBT         
                        [EC] Engineering changes: A4
                        [SN] Serial number: MT1327X00814            
                        [V0] Vendor specific: PCIe Gen3 x8    
                        [RV] Reserved: checksum good, 0 byte(s) reserved
                Read/write fields:
                        [V1] Vendor specific: N/A   
                        [YA] Asset tag: N/A                     
                        [RW] Read-write area: 105 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 253 byte(s) free
                        [RW] Read-write area: 252 byte(s) free
                End
        Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
                Vector table: BAR=0 offset=0007c000
                PBA: BAR=0 offset=0007d000
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [148 v1] Device Serial Number 00-02-c9-03-00-b6-fc-70
        Capabilities: [108 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
                IOVSta: Migration-
                Initial VFs: 64, Total VFs: 64, Number of VFs: 4, Function Dependency Link: 00
                VF offset: 1, stride: 1, Device ID: 1004
                Supported Page Size: 000007ff, System Page Size: 00000001
                Region 2: Memory at 0000380fdf000000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [154 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [18c v1] #19
        Kernel driver in use: mlx4_core

06:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]
        Subsystem: Mellanox Technologies Device 61b0
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 2: [virtual] Memory at 380fdf000000 (64-bit, prefetchable) [size=8M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=2 offset=00002000
                PBA: BAR=2 offset=00003000
        Kernel driver in use: mlx4_core

06:00.2 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]
        Subsystem: Mellanox Technologies Device 61b0
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 2: [virtual] Memory at 380fdf800000 (64-bit, prefetchable) [size=8M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=2 offset=00002000
                PBA: BAR=2 offset=00003000
        Kernel driver in use: mlx4_core

06:00.3 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]
        Subsystem: Mellanox Technologies Device 61b0
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 2: [virtual] Memory at 380fe0000000 (64-bit, prefetchable) [size=8M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=2 offset=00002000
                PBA: BAR=2 offset=00003000
        Kernel driver in use: mlx4_core

06:00.4 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]
        Subsystem: Mellanox Technologies Device 61b0
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 2: [virtual] Memory at 380fe0800000 (64-bit, prefetchable) [size=8M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=2 offset=00002000
                PBA: BAR=2 offset=00003000
        Kernel driver in use: mlx4_core

and this is from dmesg:

[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] Xen: [mem 0x0000000000000000-0x0000000000090fff] usable
[    0.000000] Xen: [mem 0x0000000000091800-0x00000000000fffff] reserved
[    0.000000] Xen: [mem 0x0000000000100000-0x000000007dd76fff] usable
[    0.000000] Xen: [mem 0x000000007dd77000-0x000000007ddb5fff] reserved
[    0.000000] Xen: [mem 0x000000007ddb6000-0x000000007debefff] ACPI data
[    0.000000] Xen: [mem 0x000000007debf000-0x000000007e0dafff] ACPI NVS
[    0.000000] Xen: [mem 0x000000007e0db000-0x000000007f357fff] reserved
[    0.000000] Xen: [mem 0x000000007f358000-0x000000007f7fffff] ACPI NVS
[    0.000000] Xen: [mem 0x0000000080000000-0x000000008fffffff] reserved
[    0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec01fff] reserved
[    0.000000] Xen: [mem 0x00000000fec40000-0x00000000fec40fff] reserved
[    0.000000] Xen: [mem 0x00000000fed1c000-0x00000000fed3ffff] reserved
[    0.000000] Xen: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] Xen: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] Xen: [mem 0x0000000100000000-0x000000107fffffff] usable

[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] e820: last_pfn = 0x1080000 max_arch_pfn = 0x400000000
[    0.000000] e820: last_pfn = 0x7dd77 max_arch_pfn = 0x400000000
[    0.000000] e820: [mem 0x90000000-0xfebfffff] available for PCI devices
[   23.917733] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
[   24.587366] e820: reserve RAM buffer [mem 0x00091000-0x0009ffff]
[   24.587468] e820: reserve RAM buffer [mem 0x7dd77000-0x7fffffff]

do you need anything else?

-- 
Lukáš Hejtmánek

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 14:06               ` Lukas Hejtmanek
@ 2013-10-21 14:18                 ` Konrad Rzeszutek Wilk
  2013-10-21 14:23                   ` Lukas Hejtmanek
                                     ` (2 more replies)
       [not found]                 ` <20131021140607.GQ20913-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
  2013-10-21 14:20                 ` Jan Beulich
  2 siblings, 3 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-21 14:18 UTC (permalink / raw)
  To: Lukas Hejtmanek; +Cc: roland, linux-rdma, Jan Beulich, xen-devel

On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
> On Mon, Oct 21, 2013 at 09:39:33AM -0400, konrad wilk wrote:
> > Anyhow, one easy thing to figure out is to get the lspci -v output
> > from the InfiniBand card
> > to see where its BARs are, and also the start of the kernel. You
> > should see an E820 map (please also boot with
> > "debug" on the Linux command line).
> 
> note, adding _PAGE_IO as Jan suggested fixed those mem errors.

<nods> Right.
> 
> here is lspci from the card and its virtual functions.
> 
> 06:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
>         Subsystem: Mellanox Technologies Device 0017
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 42
>         Region 0: Memory at dfa00000 (64-bit, non-prefetchable) [size=1M]
>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]

Wow.

> 06:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]
>         Subsystem: Mellanox Technologies Device 61b0
>         Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Region 2: [virtual] Memory at 380fdf000000 (64-bit, prefetchable) [size=8M]

Wow again.

.. snip..
> and this is from dmesg:
> 
> [    0.000000] e820: BIOS-provided physical RAM map:
> [    0.000000] Xen: [mem 0x0000000000000000-0x0000000000090fff] usable
> [    0.000000] Xen: [mem 0x0000000000091800-0x00000000000fffff] reserved
> [    0.000000] Xen: [mem 0x0000000000100000-0x000000007dd76fff] usable
> [    0.000000] Xen: [mem 0x000000007dd77000-0x000000007ddb5fff] reserved
> [    0.000000] Xen: [mem 0x000000007ddb6000-0x000000007debefff] ACPI data
> [    0.000000] Xen: [mem 0x000000007debf000-0x000000007e0dafff] ACPI NVS
> [    0.000000] Xen: [mem 0x000000007e0db000-0x000000007f357fff] reserved
> [    0.000000] Xen: [mem 0x000000007f358000-0x000000007f7fffff] ACPI NVS
> [    0.000000] Xen: [mem 0x0000000080000000-0x000000008fffffff] reserved
> [    0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec01fff] reserved
> [    0.000000] Xen: [mem 0x00000000fec40000-0x00000000fec40fff] reserved
> [    0.000000] Xen: [mem 0x00000000fed1c000-0x00000000fed3ffff] reserved
> [    0.000000] Xen: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
> [    0.000000] Xen: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
> [    0.000000] Xen: [mem 0x0000000100000000-0x000000107fffffff] usable

Odd, there should be messages about 1-1 mapping when you use 'debug'.

But either way - the problem (bug) is what I suspected - we treat any region
past the E820 as INVALID_P2M_ENTRY and hence doing any set_pte(..) operations
will fetch an 0 value, which in turn means that the PTE is zero (with the
0x200 _PAGE_SPECIAL b/c of VMA tracking).

Now the fix is to determine _where_ the end of real memory is so that we
can make sure that ballooning will work (in case of dom0_mem_max parameter).
And then anything past that PFN can be treated as IDENTITY_FRAME.

Naively, I think this patch would do it:

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 09f3059..3871554 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
 
                __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
        }
+       /* Anything past the balloon area is marked as identity. */
+       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
+               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));
 }
 
 static unsigned long __init xen_do_chunk(unsigned long start,

But this is not even compile tested :-(

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                 ` <20131021140607.GQ20913-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
@ 2013-10-21 14:20                   ` Jan Beulich
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-21 14:20 UTC (permalink / raw)
  To: Lukas Hejtmanek, konrad wilk
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

>>> On 21.10.13 at 16:06, Lukas Hejtmanek <xhejtman-8qz54MUs51PtwjQa/ONI9g@public.gmane.org> wrote:
> here is lspci from the card and its virtual functions.
> 
> 06:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
>         Subsystem: Mellanox Technologies Device 0017
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 42
>         Region 0: Memory at dfa00000 (64-bit, non-prefetchable) [size=1M]
>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]

Which confirms what Konrad said regarding MMIO above 4Gb.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 14:06               ` Lukas Hejtmanek
  2013-10-21 14:18                 ` Konrad Rzeszutek Wilk
       [not found]                 ` <20131021140607.GQ20913-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
@ 2013-10-21 14:20                 ` Jan Beulich
  2 siblings, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-21 14:20 UTC (permalink / raw)
  To: Lukas Hejtmanek, konrad wilk; +Cc: roland, linux-rdma, xen-devel

>>> On 21.10.13 at 16:06, Lukas Hejtmanek <xhejtman@ics.muni.cz> wrote:
> here is lspci from the card and its virtual functions.
> 
> 06:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
>         Subsystem: Mellanox Technologies Device 0017
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 42
>         Region 0: Memory at dfa00000 (64-bit, non-prefetchable) [size=1M]
>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]

Which confirms what Konrad said regarding MMIO above 4Gb.

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                   ` <20131021141855.GA4211-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2013-10-21 14:23                     ` Lukas Hejtmanek
  2013-10-21 14:27                     ` Jan Beulich
  1 sibling, 0 replies; 40+ messages in thread
From: Lukas Hejtmanek @ 2013-10-21 14:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jan Beulich, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Oct 21, 2013 at 10:18:55AM -0400, Konrad Rzeszutek Wilk wrote:
> Odd, there should be messages about 1-1 mapping when you use 'debug'.

cat /proc/cmdline 
placeholder root=UUID=b5711e0a-3fc8-44ec-940f-112e60d8f143 ro debug

so I suppose, I did it right. Maybe I didn't compile something important in?

-- 
Lukáš Hejtmánek
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 14:18                 ` Konrad Rzeszutek Wilk
@ 2013-10-21 14:23                   ` Lukas Hejtmanek
       [not found]                   ` <20131021141855.GA4211-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
  2013-10-21 14:27                   ` Jan Beulich
  2 siblings, 0 replies; 40+ messages in thread
From: Lukas Hejtmanek @ 2013-10-21 14:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: roland, linux-rdma, Jan Beulich, xen-devel

On Mon, Oct 21, 2013 at 10:18:55AM -0400, Konrad Rzeszutek Wilk wrote:
> Odd, there should be messages about 1-1 mapping when you use 'debug'.

cat /proc/cmdline 
placeholder root=UUID=b5711e0a-3fc8-44ec-940f-112e60d8f143 ro debug

so I suppose, I did it right. Maybe I didn't compile something important in?

-- 
Lukáš Hejtmánek

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                   ` <20131021141855.GA4211-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
  2013-10-21 14:23                     ` [Xen-devel] " Lukas Hejtmanek
@ 2013-10-21 14:27                     ` Jan Beulich
       [not found]                       ` <5265560602000078000FC73E-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
  2013-10-21 14:44                       ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-21 14:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Lukas Hejtmanek, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

>>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
>>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
>...
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
>  
>                 __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>         }
> +       /* Anything past the balloon area is marked as identity. */
> +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
> +               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));

Hardly - MAX_DOMAIN_PAGES derives from
CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
to where MMIO might be. Should you perhaps simply start from
an all 1:1 mapping, inserting the RAM translations as you find
them?

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 14:18                 ` Konrad Rzeszutek Wilk
  2013-10-21 14:23                   ` Lukas Hejtmanek
       [not found]                   ` <20131021141855.GA4211-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2013-10-21 14:27                   ` Jan Beulich
  2 siblings, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-21 14:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: roland, linux-rdma, Lukas Hejtmanek, xen-devel

>>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
>>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
>...
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
>  
>                 __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>         }
> +       /* Anything past the balloon area is marked as identity. */
> +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
> +               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));

Hardly - MAX_DOMAIN_PAGES derives from
CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
to where MMIO might be. Should you perhaps simply start from
an all 1:1 mapping, inserting the RAM translations as you find
them?

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                       ` <5265560602000078000FC73E-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
@ 2013-10-21 14:44                         ` Konrad Rzeszutek Wilk
  2013-10-21 15:12                           ` Jan Beulich
       [not found]                           ` <20131021144407.GC4560-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
  0 siblings, 2 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-21 14:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Lukas Hejtmanek, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
> >>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> > On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
> >>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
> >...
> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> > @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
> >  
> >                 __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> >         }
> > +       /* Anything past the balloon area is marked as identity. */
> > +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
> > +               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));
> 
> Hardly - MAX_DOMAIN_PAGES derives from
> CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
> to where MMIO might be. Should you perhaps simply start from

Looks like your mailer ate some words.

> an all 1:1 mapping, inserting the RAM translations as you find
> them?


Yeah, as this code can be called for the regions under 4GB. Definitly
needs more analysis.

Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?

> 
> Jan
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 14:27                     ` Jan Beulich
       [not found]                       ` <5265560602000078000FC73E-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
@ 2013-10-21 14:44                       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-21 14:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: roland, linux-rdma, Lukas Hejtmanek, xen-devel

On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
> >>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
> >>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
> >...
> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> > @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
> >  
> >                 __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> >         }
> > +       /* Anything past the balloon area is marked as identity. */
> > +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
> > +               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));
> 
> Hardly - MAX_DOMAIN_PAGES derives from
> CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
> to where MMIO might be. Should you perhaps simply start from

Looks like your mailer ate some words.

> an all 1:1 mapping, inserting the RAM translations as you find
> them?


Yeah, as this code can be called for the regions under 4GB. Definitly
needs more analysis.

Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                           ` <20131021144407.GC4560-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2013-10-21 15:12                             ` Jan Beulich
  2013-10-23 15:36                               ` Konrad Rzeszutek Wilk
       [not found]                               ` <5265609802000078000FC7B7-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
  0 siblings, 2 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-21 15:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Lukas Hejtmanek, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

>>> On 21.10.13 at 16:44, Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
>> >>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>> > On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
>> >>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
>> >...
>> > --- a/arch/x86/xen/setup.c
>> > +++ b/arch/x86/xen/setup.c
>> > @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
>> >  
>> >                 __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>> >         }
>> > +       /* Anything past the balloon area is marked as identity. */
>> > +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
>> > +               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));
>> 
>> Hardly - MAX_DOMAIN_PAGES derives from
>> CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
>> to where MMIO might be. Should you perhaps simply start from
> 
> Looks like your mailer ate some words.

I don't think so - they're all there in the text you quoted.

>> an all 1:1 mapping, inserting the RAM translations as you find
>> them?
> 
> 
> Yeah, as this code can be called for the regions under 4GB. Definitly
> needs more analysis.
> 
> Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?

That was for PVH, and is obviously fragile, as there can be MMIO
regions not matched by any PCI device's BAR. We could hope for
all of them to be below 4Gb, but I think (based on logs I got to see
recently from a certain vendor's upcoming systems) this isn't going
to work out.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 14:44                         ` Konrad Rzeszutek Wilk
@ 2013-10-21 15:12                           ` Jan Beulich
       [not found]                           ` <20131021144407.GC4560-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
  1 sibling, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-21 15:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: roland, linux-rdma, Lukas Hejtmanek, xen-devel

>>> On 21.10.13 at 16:44, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
>> >>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> > On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
>> >>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
>> >...
>> > --- a/arch/x86/xen/setup.c
>> > +++ b/arch/x86/xen/setup.c
>> > @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
>> >  
>> >                 __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>> >         }
>> > +       /* Anything past the balloon area is marked as identity. */
>> > +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
>> > +               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));
>> 
>> Hardly - MAX_DOMAIN_PAGES derives from
>> CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
>> to where MMIO might be. Should you perhaps simply start from
> 
> Looks like your mailer ate some words.

I don't think so - they're all there in the text you quoted.

>> an all 1:1 mapping, inserting the RAM translations as you find
>> them?
> 
> 
> Yeah, as this code can be called for the regions under 4GB. Definitly
> needs more analysis.
> 
> Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?

That was for PVH, and is obviously fragile, as there can be MMIO
regions not matched by any PCI device's BAR. We could hope for
all of them to be below 4Gb, but I think (based on logs I got to see
recently from a certain vendor's upcoming systems) this isn't going
to work out.

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                               ` <5265609802000078000FC7B7-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
@ 2013-10-23 15:36                                 ` Konrad Rzeszutek Wilk
  2013-10-23 15:45                                   ` Jan Beulich
                                                     ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-23 15:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Lukas Hejtmanek, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
> >>> On 21.10.13 at 16:44, Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> > On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
> >> >>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> >> > On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
> >> >>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
> >> >...
> >> > --- a/arch/x86/xen/setup.c
> >> > +++ b/arch/x86/xen/setup.c
> >> > @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
> >> >  
> >> >                 __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> >> >         }
> >> > +       /* Anything past the balloon area is marked as identity. */
> >> > +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
> >> > +               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));
> >> 
> >> Hardly - MAX_DOMAIN_PAGES derives from
> >> CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
> >> to where MMIO might be. Should you perhaps simply start from
> > 
> > Looks like your mailer ate some words.
> 
> I don't think so - they're all there in the text you quoted.
> 
> >> an all 1:1 mapping, inserting the RAM translations as you find
> >> them?
> > 
> > 
> > Yeah, as this code can be called for the regions under 4GB. Definitly
> > needs more analysis.
> > 
> > Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?
> 
> That was for PVH, and is obviously fragile, as there can be MMIO
> regions not matched by any PCI device's BAR. We could hope for
> all of them to be below 4Gb, but I think (based on logs I got to see
> recently from a certain vendor's upcoming systems) this isn't going
> to work out.

This is the patch I had in mind that I think will fix these issues. But
I would appreciate testing it and naturally send me the dmesg if possible.


diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index b232908..258e3f9 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -133,6 +133,25 @@ static void balloon_append(struct page *page)
 	adjust_managed_page_count(page, -1);
 }
 
+/*
+ * Check if any the balloon pages overlap with the supplied
+ * pfn and its range.
+ */
+bool balloon_pfn(unsigned long pfn, unsigned long nr)
+{
+	struct page *page;
+
+	if (list_empty(&ballooned_pages))
+		return false;
+
+	list_for_each_entry(page, &ballooned_pages, lru) {
+		unsigned long b_pfn = page_to_pfn(page);
+
+		if (b_pfn >= pfn && b_pfn < pfn + nr)
+			return true;
+	}
+	return false;
+}
 /* balloon_retrieve: rescue a page from the balloon, if it is not empty. */
 static struct page *balloon_retrieve(bool prefer_highmem)
 {
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 18fff88..7e5ff49 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -17,11 +17,16 @@
  * Author: Weidong Han <weidong.han-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  */
 
+#define DEBUG 1
+
 #include <linux/pci.h>
 #include <linux/acpi.h>
 #include <xen/xen.h>
 #include <xen/interface/physdev.h>
 #include <xen/interface/xen.h>
+#include <xen/interface/memory.h>
+#include <xen/page.h>
+#include <xen/balloon.h>
 
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
@@ -123,10 +128,78 @@ static int xen_add_device(struct device *dev)
 		r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add,
 			&manage_pci);
 	}
-
 	return r;
 }
 
+static void xen_p2m_add_device(struct device *dev)
+{
+	int i;
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+
+	/* Verify whether the MMIO BARs are 1-1 in the P2M. */
+	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+		unsigned long pfn, start, end, ok_pfns;
+		char bus_addr[64];
+		char *fmt;
+
+		if (!pci_resource_len(pci_dev, i))
+			continue;
+
+		if (pci_resource_flags(pci_dev, i) == IORESOURCE_IO)
+			fmt = " (bus address [%#06llx-%#06llx])";
+		else
+			fmt = " (bus address [%#010llx-%#010llx])";
+
+		snprintf(bus_addr, sizeof(bus_addr), fmt,
+			 (unsigned long long) (pci_resource_start(pci_dev, i)),
+			 (unsigned long long) (pci_resource_end(pci_dev, i)));
+
+		start = pci_resource_start(pci_dev, i) >> PAGE_SHIFT;
+		end = pci_resource_end(pci_dev, i) >> PAGE_SHIFT;
+
+		/*
+		 * We don't worry about the balloon scratch page as it has a
+		 * valid PFN - which means we will catch in the loop below.
+		 */
+		if (balloon_pfn(start, end - start)) {
+			dev_warn(dev, "%s is within balloon pages!\n", bus_addr);
+			continue;
+		}
+
+		for (ok_pfns = 0, pfn = start; pfn < end; pfn ++) {
+			unsigned long mfn = pfn_to_mfn(pfn);
+
+			if (mfn == pfn) {
+				ok_pfns ++;
+				continue;
+			}
+			if (mfn != INVALID_P2M_ENTRY) { /* RAM */
+				dev_warn(dev, "%s is within RAM [%lx] region!\n", bus_addr, pfn);
+				break;
+			}
+		}
+		dev_dbg(dev, "%s pfn:%lx, s:%lx, e:%lx ok:%ld\n", bus_addr, pfn, start, end, ok_pfns);
+		if (pfn != end - 1) /* We broke out of the loop above. */
+			continue;
+
+		if (ok_pfns == end - start) /* All good. */
+			continue;
+
+		dev_dbg(dev, "%s [%lx->%lx]\n", bus_addr, start, end);
+
+		/* This BAR was not detected during E820 parsing. */
+		for (pfn = start; pfn < end; pfn ++) {
+			if (!set_phys_to_machine(pfn, pfn))
+				break;
+		}
+		WARN(pfn != end - 1, "Only set %ld instead of %ld PFNs!\n",
+		     end - pfn, end - start);
+
+		dev_info(dev, "%s set %ld page(s) to 1-1 mapping.\n",
+			 bus_addr, end - pfn);
+	}
+}
+
 static int xen_remove_device(struct device *dev)
 {
 	int r;
@@ -164,10 +237,14 @@ static int xen_pci_notifier(struct notifier_block *nb,
 
 	switch (action) {
 	case BUS_NOTIFY_ADD_DEVICE:
-		r = xen_add_device(dev);
+		if (xen_initial_domain())
+			r = xen_add_device(dev);
+		if (r == 0)
+			xen_p2m_add_device(dev);
 		break;
 	case BUS_NOTIFY_DEL_DEVICE:
-		r = xen_remove_device(dev);
+		if (xen_initial_domain())
+			r = xen_remove_device(dev);
 		break;
 	default:
 		return NOTIFY_DONE;
@@ -185,9 +262,8 @@ static struct notifier_block device_nb = {
 
 static int __init register_xen_pci_notifier(void)
 {
-	if (!xen_initial_domain())
+	if (!xen_domain())
 		return 0;
-
 	return bus_register_notifier(&pci_bus_type, &device_nb);
 }
 
diff --git a/include/xen/balloon.h b/include/xen/balloon.h
index a4c1c6a..60ecc50 100644
--- a/include/xen/balloon.h
+++ b/include/xen/balloon.h
@@ -41,3 +41,4 @@ static inline int register_xen_selfballooning(struct device *dev)
 	return -ENOSYS;
 }
 #endif
+bool balloon_pfn(unsigned long pfn, unsigned long nr);
> 
> Jan
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-21 15:12                             ` [Xen-devel] " Jan Beulich
@ 2013-10-23 15:36                               ` Konrad Rzeszutek Wilk
       [not found]                               ` <5265609802000078000FC7B7-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
  1 sibling, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-23 15:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: roland, linux-rdma, Lukas Hejtmanek, xen-devel

On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
> >>> On 21.10.13 at 16:44, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
> >> >>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >> > On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
> >> >>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
> >> >...
> >> > --- a/arch/x86/xen/setup.c
> >> > +++ b/arch/x86/xen/setup.c
> >> > @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
> >> >  
> >> >                 __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> >> >         }
> >> > +       /* Anything past the balloon area is marked as identity. */
> >> > +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
> >> > +               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));
> >> 
> >> Hardly - MAX_DOMAIN_PAGES derives from
> >> CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
> >> to where MMIO might be. Should you perhaps simply start from
> > 
> > Looks like your mailer ate some words.
> 
> I don't think so - they're all there in the text you quoted.
> 
> >> an all 1:1 mapping, inserting the RAM translations as you find
> >> them?
> > 
> > 
> > Yeah, as this code can be called for the regions under 4GB. Definitly
> > needs more analysis.
> > 
> > Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?
> 
> That was for PVH, and is obviously fragile, as there can be MMIO
> regions not matched by any PCI device's BAR. We could hope for
> all of them to be below 4Gb, but I think (based on logs I got to see
> recently from a certain vendor's upcoming systems) this isn't going
> to work out.

This is the patch I had in mind that I think will fix these issues. But
I would appreciate testing it and naturally send me the dmesg if possible.


diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index b232908..258e3f9 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -133,6 +133,25 @@ static void balloon_append(struct page *page)
 	adjust_managed_page_count(page, -1);
 }
 
+/*
+ * Check if any the balloon pages overlap with the supplied
+ * pfn and its range.
+ */
+bool balloon_pfn(unsigned long pfn, unsigned long nr)
+{
+	struct page *page;
+
+	if (list_empty(&ballooned_pages))
+		return false;
+
+	list_for_each_entry(page, &ballooned_pages, lru) {
+		unsigned long b_pfn = page_to_pfn(page);
+
+		if (b_pfn >= pfn && b_pfn < pfn + nr)
+			return true;
+	}
+	return false;
+}
 /* balloon_retrieve: rescue a page from the balloon, if it is not empty. */
 static struct page *balloon_retrieve(bool prefer_highmem)
 {
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 18fff88..7e5ff49 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -17,11 +17,16 @@
  * Author: Weidong Han <weidong.han@intel.com>
  */
 
+#define DEBUG 1
+
 #include <linux/pci.h>
 #include <linux/acpi.h>
 #include <xen/xen.h>
 #include <xen/interface/physdev.h>
 #include <xen/interface/xen.h>
+#include <xen/interface/memory.h>
+#include <xen/page.h>
+#include <xen/balloon.h>
 
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
@@ -123,10 +128,78 @@ static int xen_add_device(struct device *dev)
 		r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add,
 			&manage_pci);
 	}
-
 	return r;
 }
 
+static void xen_p2m_add_device(struct device *dev)
+{
+	int i;
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+
+	/* Verify whether the MMIO BARs are 1-1 in the P2M. */
+	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+		unsigned long pfn, start, end, ok_pfns;
+		char bus_addr[64];
+		char *fmt;
+
+		if (!pci_resource_len(pci_dev, i))
+			continue;
+
+		if (pci_resource_flags(pci_dev, i) == IORESOURCE_IO)
+			fmt = " (bus address [%#06llx-%#06llx])";
+		else
+			fmt = " (bus address [%#010llx-%#010llx])";
+
+		snprintf(bus_addr, sizeof(bus_addr), fmt,
+			 (unsigned long long) (pci_resource_start(pci_dev, i)),
+			 (unsigned long long) (pci_resource_end(pci_dev, i)));
+
+		start = pci_resource_start(pci_dev, i) >> PAGE_SHIFT;
+		end = pci_resource_end(pci_dev, i) >> PAGE_SHIFT;
+
+		/*
+		 * We don't worry about the balloon scratch page as it has a
+		 * valid PFN - which means we will catch in the loop below.
+		 */
+		if (balloon_pfn(start, end - start)) {
+			dev_warn(dev, "%s is within balloon pages!\n", bus_addr);
+			continue;
+		}
+
+		for (ok_pfns = 0, pfn = start; pfn < end; pfn ++) {
+			unsigned long mfn = pfn_to_mfn(pfn);
+
+			if (mfn == pfn) {
+				ok_pfns ++;
+				continue;
+			}
+			if (mfn != INVALID_P2M_ENTRY) { /* RAM */
+				dev_warn(dev, "%s is within RAM [%lx] region!\n", bus_addr, pfn);
+				break;
+			}
+		}
+		dev_dbg(dev, "%s pfn:%lx, s:%lx, e:%lx ok:%ld\n", bus_addr, pfn, start, end, ok_pfns);
+		if (pfn != end - 1) /* We broke out of the loop above. */
+			continue;
+
+		if (ok_pfns == end - start) /* All good. */
+			continue;
+
+		dev_dbg(dev, "%s [%lx->%lx]\n", bus_addr, start, end);
+
+		/* This BAR was not detected during E820 parsing. */
+		for (pfn = start; pfn < end; pfn ++) {
+			if (!set_phys_to_machine(pfn, pfn))
+				break;
+		}
+		WARN(pfn != end - 1, "Only set %ld instead of %ld PFNs!\n",
+		     end - pfn, end - start);
+
+		dev_info(dev, "%s set %ld page(s) to 1-1 mapping.\n",
+			 bus_addr, end - pfn);
+	}
+}
+
 static int xen_remove_device(struct device *dev)
 {
 	int r;
@@ -164,10 +237,14 @@ static int xen_pci_notifier(struct notifier_block *nb,
 
 	switch (action) {
 	case BUS_NOTIFY_ADD_DEVICE:
-		r = xen_add_device(dev);
+		if (xen_initial_domain())
+			r = xen_add_device(dev);
+		if (r == 0)
+			xen_p2m_add_device(dev);
 		break;
 	case BUS_NOTIFY_DEL_DEVICE:
-		r = xen_remove_device(dev);
+		if (xen_initial_domain())
+			r = xen_remove_device(dev);
 		break;
 	default:
 		return NOTIFY_DONE;
@@ -185,9 +262,8 @@ static struct notifier_block device_nb = {
 
 static int __init register_xen_pci_notifier(void)
 {
-	if (!xen_initial_domain())
+	if (!xen_domain())
 		return 0;
-
 	return bus_register_notifier(&pci_bus_type, &device_nb);
 }
 
diff --git a/include/xen/balloon.h b/include/xen/balloon.h
index a4c1c6a..60ecc50 100644
--- a/include/xen/balloon.h
+++ b/include/xen/balloon.h
@@ -41,3 +41,4 @@ static inline int register_xen_selfballooning(struct device *dev)
 	return -ENOSYS;
 }
 #endif
+bool balloon_pfn(unsigned long pfn, unsigned long nr);
> 
> Jan
> 

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                                   ` <20131023153645.GA28011-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2013-10-23 15:45                                     ` Jan Beulich
  2013-10-23 16:04                                       ` Konrad Rzeszutek Wilk
       [not found]                                       ` <5267FD3102000078000A56A1-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
  2013-10-24 23:08                                     ` [Xen-devel] " David Vrabel
  1 sibling, 2 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-23 15:45 UTC (permalink / raw)
  To: konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA
  Cc: xhejtman-8qz54MUs51PtwjQa/ONI9g, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

>>> Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 10/23/13 5:37 PM >>>
>On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
>> > Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?
>> 
>> That was for PVH, and is obviously fragile, as there can be MMIO
>> regions not matched by any PCI device's BAR. We could hope for
>> all of them to be below 4Gb, but I think (based on logs I got to see
>> recently from a certain vendor's upcoming systems) this isn't going
>> to work out.
>
>This is the patch I had in mind that I think will fix these issues. But
>I would appreciate testing it and naturally send me the dmesg if possible.

So this indeed is only about PCI devices (i.e. not taking into account the
comment I made earlier [above]).

Further, a brute force loop over all balloon pages seems like a pretty
bad thing to do when the balloon is rather big.

And finally - did you check that the bus notification happens after resource
assignment?

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-23 15:36                                 ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2013-10-23 15:45                                   ` Jan Beulich
       [not found]                                   ` <20131023153645.GA28011-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
  2013-10-24 23:08                                   ` David Vrabel
  2 siblings, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-23 15:45 UTC (permalink / raw)
  To: konrad.wilk; +Cc: roland, linux-rdma, xhejtman, xen-devel

>>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 10/23/13 5:37 PM >>>
>On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
>> > Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?
>> 
>> That was for PVH, and is obviously fragile, as there can be MMIO
>> regions not matched by any PCI device's BAR. We could hope for
>> all of them to be below 4Gb, but I think (based on logs I got to see
>> recently from a certain vendor's upcoming systems) this isn't going
>> to work out.
>
>This is the patch I had in mind that I think will fix these issues. But
>I would appreciate testing it and naturally send me the dmesg if possible.

So this indeed is only about PCI devices (i.e. not taking into account the
comment I made earlier [above]).

Further, a brute force loop over all balloon pages seems like a pretty
bad thing to do when the balloon is rather big.

And finally - did you check that the bus notification happens after resource
assignment?

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                                       ` <5267FD3102000078000A56A1-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
@ 2013-10-23 16:04                                         ` Konrad Rzeszutek Wilk
       [not found]                                           ` <20131023160433.GA28260-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
  2013-10-23 16:35                                           ` Jan Beulich
  0 siblings, 2 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-23 16:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xhejtman-8qz54MUs51PtwjQa/ONI9g, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Oct 23, 2013 at 04:45:37PM +0100, Jan Beulich wrote:
> >>> Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 10/23/13 5:37 PM >>>
> >On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
> >> > Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?
> >> 
> >> That was for PVH, and is obviously fragile, as there can be MMIO
> >> regions not matched by any PCI device's BAR. We could hope for
> >> all of them to be below 4Gb, but I think (based on logs I got to see
> >> recently from a certain vendor's upcoming systems) this isn't going
> >> to work out.
> >
> >This is the patch I had in mind that I think will fix these issues. But
> >I would appreciate testing it and naturally send me the dmesg if possible.
> 
> So this indeed is only about PCI devices (i.e. not taking into account the
> comment I made earlier [above]).

Correct.
What are some of those devices? It would help to understand what those are.

> 
> Further, a brute force loop over all balloon pages seems like a pretty
> bad thing to do when the balloon is rather big.

Sure.
> 
> And finally - did you check that the bus notification happens after resource
> assignment?

They do occur during the normal resource assigment. But I presume you
meant during resource re-assigment?

> 
> Jan
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-23 15:45                                     ` [Xen-devel] " Jan Beulich
@ 2013-10-23 16:04                                       ` Konrad Rzeszutek Wilk
       [not found]                                       ` <5267FD3102000078000A56A1-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
  1 sibling, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-23 16:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: roland, linux-rdma, xhejtman, xen-devel

On Wed, Oct 23, 2013 at 04:45:37PM +0100, Jan Beulich wrote:
> >>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 10/23/13 5:37 PM >>>
> >On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
> >> > Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?
> >> 
> >> That was for PVH, and is obviously fragile, as there can be MMIO
> >> regions not matched by any PCI device's BAR. We could hope for
> >> all of them to be below 4Gb, but I think (based on logs I got to see
> >> recently from a certain vendor's upcoming systems) this isn't going
> >> to work out.
> >
> >This is the patch I had in mind that I think will fix these issues. But
> >I would appreciate testing it and naturally send me the dmesg if possible.
> 
> So this indeed is only about PCI devices (i.e. not taking into account the
> comment I made earlier [above]).

Correct.
What are some of those devices? It would help to understand what those are.

> 
> Further, a brute force loop over all balloon pages seems like a pretty
> bad thing to do when the balloon is rather big.

Sure.
> 
> And finally - did you check that the bus notification happens after resource
> assignment?

They do occur during the normal resource assigment. But I presume you
meant during resource re-assigment?

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                                           ` <20131023160433.GA28260-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2013-10-23 16:35                                             ` Jan Beulich
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-23 16:35 UTC (permalink / raw)
  To: konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA
  Cc: xhejtman-8qz54MUs51PtwjQa/ONI9g, roland-DgEjT+Ai2ygdnm+yROfE0A,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

>>> Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 10/23/13 6:05 PM >>>
>On Wed, Oct 23, 2013 at 04:45:37PM +0100, Jan Beulich wrote:
> So this indeed is only about PCI devices (i.e. not taking into account the
> comment I made earlier [above]).
>
>Correct.
>What are some of those devices? It would help to understand what those are.

The simplest possible thing are MCFG ranges, which aren't required to be
present in the E820 map.

>> And finally - did you check that the bus notification happens after resource
>> assignment?
>
>They do occur during the normal resource assigment. But I presume you
>meant during resource re-assigment?

No, I meant only assignment - iirc, re-assignment is still unimplemented.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-23 16:04                                         ` [Xen-devel] " Konrad Rzeszutek Wilk
       [not found]                                           ` <20131023160433.GA28260-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2013-10-23 16:35                                           ` Jan Beulich
  1 sibling, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2013-10-23 16:35 UTC (permalink / raw)
  To: konrad.wilk; +Cc: roland, linux-rdma, xhejtman, xen-devel

>>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 10/23/13 6:05 PM >>>
>On Wed, Oct 23, 2013 at 04:45:37PM +0100, Jan Beulich wrote:
> So this indeed is only about PCI devices (i.e. not taking into account the
> comment I made earlier [above]).
>
>Correct.
>What are some of those devices? It would help to understand what those are.

The simplest possible thing are MCFG ranges, which aren't required to be
present in the E820 map.

>> And finally - did you check that the bus notification happens after resource
>> assignment?
>
>They do occur during the normal resource assigment. But I presume you
>meant during resource re-assigment?

No, I meant only assignment - iirc, re-assignment is still unimplemented.

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                                   ` <20131023153645.GA28011-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
  2013-10-23 15:45                                     ` [Xen-devel] " Jan Beulich
@ 2013-10-24 23:08                                     ` David Vrabel
       [not found]                                       ` <5269A865.2010100-5LkwijKnu/2sTnJN9+BGXg@public.gmane.org>
  2013-10-25 14:21                                       ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 40+ messages in thread
From: David Vrabel @ 2013-10-24 23:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jan Beulich, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Lukas Hejtmanek,
	xen-devel-GuqFBffKawuEi8DpZVb4nw

On 23/10/13 16:36, Konrad Rzeszutek Wilk wrote:
> On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
>>>>> On 21.10.13 at 16:44, Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>> On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
>>>>>>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>>>> On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
>>>>>>          Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
>>>>> ...
>>>>> --- a/arch/x86/xen/setup.c
>>>>> +++ b/arch/x86/xen/setup.c
>>>>> @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
>>>>>
>>>>>                  __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>>>>>          }
>>>>> +       /* Anything past the balloon area is marked as identity. */
>>>>> +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
>>>>> +               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));
>>>>
>>>> Hardly - MAX_DOMAIN_PAGES derives from
>>>> CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
>>>> to where MMIO might be. Should you perhaps simply start from
>>>
>>> Looks like your mailer ate some words.
>>
>> I don't think so - they're all there in the text you quoted.
>>
>>>> an all 1:1 mapping, inserting the RAM translations as you find
>>>> them?
>>>
>>>
>>> Yeah, as this code can be called for the regions under 4GB. Definitly
>>> needs more analysis.
>>>
>>> Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?
>>
>> That was for PVH, and is obviously fragile, as there can be MMIO
>> regions not matched by any PCI device's BAR. We could hope for
>> all of them to be below 4Gb, but I think (based on logs I got to see
>> recently from a certain vendor's upcoming systems) this isn't going
>> to work out.
>
> This is the patch I had in mind that I think will fix these issues. But
> I would appreciate testing it and naturally send me the dmesg if possible.

I think there is a simpler way to handle this.

If INVALID_P2M_ENTRY implies 1:1 and we arrange:

a) pfn_to_mfn() to return pfn if the mfn is missing in the p2m
b) mfn_to_pfn() to return mfn if p2m(m2p(mfn)) != mfn and there is no 
m2p override.

Then:

a) The identity p2m entries can be removed.
b) _PAGE_IOMAP becomes unnecessary.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-23 15:36                                 ` [Xen-devel] " Konrad Rzeszutek Wilk
  2013-10-23 15:45                                   ` Jan Beulich
       [not found]                                   ` <20131023153645.GA28011-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2013-10-24 23:08                                   ` David Vrabel
  2 siblings, 0 replies; 40+ messages in thread
From: David Vrabel @ 2013-10-24 23:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: roland, linux-rdma, Lukas Hejtmanek, Jan Beulich, xen-devel

On 23/10/13 16:36, Konrad Rzeszutek Wilk wrote:
> On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
>>>>> On 21.10.13 at 16:44, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>>> On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
>>>>>>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>>>>> On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
>>>>>>          Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
>>>>> ...
>>>>> --- a/arch/x86/xen/setup.c
>>>>> +++ b/arch/x86/xen/setup.c
>>>>> @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
>>>>>
>>>>>                  __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>>>>>          }
>>>>> +       /* Anything past the balloon area is marked as identity. */
>>>>> +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
>>>>> +               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));
>>>>
>>>> Hardly - MAX_DOMAIN_PAGES derives from
>>>> CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
>>>> to where MMIO might be. Should you perhaps simply start from
>>>
>>> Looks like your mailer ate some words.
>>
>> I don't think so - they're all there in the text you quoted.
>>
>>>> an all 1:1 mapping, inserting the RAM translations as you find
>>>> them?
>>>
>>>
>>> Yeah, as this code can be called for the regions under 4GB. Definitly
>>> needs more analysis.
>>>
>>> Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?
>>
>> That was for PVH, and is obviously fragile, as there can be MMIO
>> regions not matched by any PCI device's BAR. We could hope for
>> all of them to be below 4Gb, but I think (based on logs I got to see
>> recently from a certain vendor's upcoming systems) this isn't going
>> to work out.
>
> This is the patch I had in mind that I think will fix these issues. But
> I would appreciate testing it and naturally send me the dmesg if possible.

I think there is a simpler way to handle this.

If INVALID_P2M_ENTRY implies 1:1 and we arrange:

a) pfn_to_mfn() to return pfn if the mfn is missing in the p2m
b) mfn_to_pfn() to return mfn if p2m(m2p(mfn)) != mfn and there is no 
m2p override.

Then:

a) The identity p2m entries can be removed.
b) _PAGE_IOMAP becomes unnecessary.

David

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                                       ` <5269A865.2010100-5LkwijKnu/2sTnJN9+BGXg@public.gmane.org>
@ 2013-10-25 14:21                                         ` Konrad Rzeszutek Wilk
       [not found]                                           ` <20131025142147.GB3742-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
  2013-12-26  6:39                                           ` Zhang, Yang Z
  0 siblings, 2 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-25 14:21 UTC (permalink / raw)
  To: David Vrabel
  Cc: Jan Beulich, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Lukas Hejtmanek,
	xen-devel-GuqFBffKawuEi8DpZVb4nw

On Fri, Oct 25, 2013 at 12:08:21AM +0100, David Vrabel wrote:
> On 23/10/13 16:36, Konrad Rzeszutek Wilk wrote:
> >On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
> >>>>>On 21.10.13 at 16:44, Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> >>>On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
> >>>>>>>On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> >>>>>On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
> >>>>>>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
> >>>>>...
> >>>>>--- a/arch/x86/xen/setup.c
> >>>>>+++ b/arch/x86/xen/setup.c
> >>>>>@@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
> >>>>>
> >>>>>                 __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> >>>>>         }
> >>>>>+       /* Anything past the balloon area is marked as identity. */
> >>>>>+       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
> >>>>>+               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));
> >>>>
> >>>>Hardly - MAX_DOMAIN_PAGES derives from
> >>>>CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
> >>>>to where MMIO might be. Should you perhaps simply start from
> >>>
> >>>Looks like your mailer ate some words.
> >>
> >>I don't think so - they're all there in the text you quoted.
> >>
> >>>>an all 1:1 mapping, inserting the RAM translations as you find
> >>>>them?
> >>>
> >>>
> >>>Yeah, as this code can be called for the regions under 4GB. Definitly
> >>>needs more analysis.
> >>>
> >>>Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?
> >>
> >>That was for PVH, and is obviously fragile, as there can be MMIO
> >>regions not matched by any PCI device's BAR. We could hope for
> >>all of them to be below 4Gb, but I think (based on logs I got to see
> >>recently from a certain vendor's upcoming systems) this isn't going
> >>to work out.
> >
> >This is the patch I had in mind that I think will fix these issues. But
> >I would appreciate testing it and naturally send me the dmesg if possible.
> 
> I think there is a simpler way to handle this.
> 
> If INVALID_P2M_ENTRY implies 1:1 and we arrange:

I am a bit afraid to make that assumption.
> 
> a) pfn_to_mfn() to return pfn if the mfn is missing in the p2m

The balloon pages are of missing type (initially). And they should
return INVALID_P2M_ENTRY at start - later on they will return the
scratch_page.

> b) mfn_to_pfn() to return mfn if p2m(m2p(mfn)) != mfn and there is
> no m2p override.

The toolstack can map pages that are are p2m(p2m(mfn)) != mfn and
have no m2p override.

> 
> Then:
> 
> a) The identity p2m entries can be removed.
> b) _PAGE_IOMAP becomes unnecessary.

You still need it for the toolstack to map other guests pages.
(xen_privcmd_map).

I think for right now to fix this issue going ahead and setting
1-1 in the P2M for affected devices (PCI and MCFG) is simpler, b/c:
 - We only do it when said device is in the guest (so if you launch
   and PCI PV guest you can still migrate it - after unplugging the
   device). Assuming all 1-1 regions might not be a healthy (I had
   a heck of time fixing all of the migration issues when I wrote
   the 1:1 code).
 - It will make PVH hypercall to mark I/O regions easier. Instead
   of it assuming that all non-RAM space is I/O regions it will be
   able to selectively setup the entries for said regions. I think
   that is what Jan suggested?
 - This is a bug - so lets fix it as a bug first.

Redoing the P2M is certainly an option but I am not signing
up for that this year. Let me post my two patches that fix
this for PCI devices and MCFG areas.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-24 23:08                                     ` [Xen-devel] " David Vrabel
       [not found]                                       ` <5269A865.2010100-5LkwijKnu/2sTnJN9+BGXg@public.gmane.org>
@ 2013-10-25 14:21                                       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-25 14:21 UTC (permalink / raw)
  To: David Vrabel; +Cc: roland, linux-rdma, Lukas Hejtmanek, Jan Beulich, xen-devel

On Fri, Oct 25, 2013 at 12:08:21AM +0100, David Vrabel wrote:
> On 23/10/13 16:36, Konrad Rzeszutek Wilk wrote:
> >On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
> >>>>>On 21.10.13 at 16:44, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >>>On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
> >>>>>>>On 21.10.13 at 16:18, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >>>>>On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
> >>>>>>         Region 2: Memory at 380fff000000 (64-bit, prefetchable) [size=8M]
> >>>>>...
> >>>>>--- a/arch/x86/xen/setup.c
> >>>>>+++ b/arch/x86/xen/setup.c
> >>>>>@@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
> >>>>>
> >>>>>                 __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> >>>>>         }
> >>>>>+       /* Anything past the balloon area is marked as identity. */
> >>>>>+       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES; pfn++)
> >>>>>+               __set_phys_to_machine(pfn, IDENTITY_FRAME(pfn));
> >>>>
> >>>>Hardly - MAX_DOMAIN_PAGES derives from
> >>>>CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated
> >>>>to where MMIO might be. Should you perhaps simply start from
> >>>
> >>>Looks like your mailer ate some words.
> >>
> >>I don't think so - they're all there in the text you quoted.
> >>
> >>>>an all 1:1 mapping, inserting the RAM translations as you find
> >>>>them?
> >>>
> >>>
> >>>Yeah, as this code can be called for the regions under 4GB. Definitly
> >>>needs more analysis.
> >>>
> >>>Were you suggesting a lookup when we scan the PCI devices? (xen_add_device)?
> >>
> >>That was for PVH, and is obviously fragile, as there can be MMIO
> >>regions not matched by any PCI device's BAR. We could hope for
> >>all of them to be below 4Gb, but I think (based on logs I got to see
> >>recently from a certain vendor's upcoming systems) this isn't going
> >>to work out.
> >
> >This is the patch I had in mind that I think will fix these issues. But
> >I would appreciate testing it and naturally send me the dmesg if possible.
> 
> I think there is a simpler way to handle this.
> 
> If INVALID_P2M_ENTRY implies 1:1 and we arrange:

I am a bit afraid to make that assumption.
> 
> a) pfn_to_mfn() to return pfn if the mfn is missing in the p2m

The balloon pages are of missing type (initially). And they should
return INVALID_P2M_ENTRY at start - later on they will return the
scratch_page.

> b) mfn_to_pfn() to return mfn if p2m(m2p(mfn)) != mfn and there is
> no m2p override.

The toolstack can map pages that are are p2m(p2m(mfn)) != mfn and
have no m2p override.

> 
> Then:
> 
> a) The identity p2m entries can be removed.
> b) _PAGE_IOMAP becomes unnecessary.

You still need it for the toolstack to map other guests pages.
(xen_privcmd_map).

I think for right now to fix this issue going ahead and setting
1-1 in the P2M for affected devices (PCI and MCFG) is simpler, b/c:
 - We only do it when said device is in the guest (so if you launch
   and PCI PV guest you can still migrate it - after unplugging the
   device). Assuming all 1-1 regions might not be a healthy (I had
   a heck of time fixing all of the migration issues when I wrote
   the 1:1 code).
 - It will make PVH hypercall to mark I/O regions easier. Instead
   of it assuming that all non-RAM space is I/O regions it will be
   able to selectively setup the entries for said regions. I think
   that is what Jan suggested?
 - This is a bug - so lets fix it as a bug first.

Redoing the P2M is certainly an option but I am not signing
up for that this year. Let me post my two patches that fix
this for PCI devices and MCFG areas.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [Xen-devel] BUG: bad page map under Xen
       [not found]                                           ` <20131025142147.GB3742-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2013-12-26  6:39                                             ` Zhang, Yang Z
  2014-01-02 14:18                                               ` David Vrabel
       [not found]                                               ` <A9667DDFB95DB7438FA9D7D576C3D87E0A99CE00-0J0gbvR4kTg/UvCtAeCM4rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 2 replies; 40+ messages in thread
From: Zhang, Yang Z @ 2013-12-26  6:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, David Vrabel
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Lukas Hejtmanek, Jan Beulich, xen-devel-GuqFBffKawuEi8DpZVb4nw,
	Liu, Jijiang

Konrad Rzeszutek Wilk wrote on 2013-10-25:
> On Fri, Oct 25, 2013 at 12:08:21AM +0100, David Vrabel wrote:
>> On 23/10/13 16:36, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
>>>>>>> On 21.10.13 at 16:44, Konrad Rzeszutek Wilk
> <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>>>> On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
>>>>>>>>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk
> <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>>>>>> On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
>>>>>>>>         Region 2: Memory at 380fff000000 (64-bit,
>>>>>>>> prefetchable) [size=8M]
>>>>>>> ...
>>>>>>> --- a/arch/x86/xen/setup.c
>>>>>>> +++ b/arch/x86/xen/setup.c
>>>>>>> @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64
>>>>>>> start,
>>>>>>> u64 size)
>>>>>>> 
>>>>>>>                 __set_phys_to_machine(pfn,
> INVALID_P2M_ENTRY);
>>>>>>>         }
>>>>>>> +       /* Anything past the balloon area is marked as identity.
>>>>>>> */ +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES;
>>>>>>> pfn++) +               __set_phys_to_machine(pfn,
> IDENTITY_FRAME(pfn));
>>>>>> 
>>>>>> Hardly - MAX_DOMAIN_PAGES derives from
>>>>>> CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated to where
>>>>>> MMIO might be. Should you perhaps simply start from
>>>>> 
>>>>> Looks like your mailer ate some words.
>>>> 
>>>> I don't think so - they're all there in the text you quoted.
>>>> 
>>>>>> an all 1:1 mapping, inserting the RAM translations as you find
>>>>>> them?
>>>>> 
>>>>> 
>>>>> Yeah, as this code can be called for the regions under 4GB.
>>>>> Definitly needs more analysis.
>>>>> 
>>>>> Were you suggesting a lookup when we scan the PCI devices?
> (xen_add_device)?
>>>> 
>>>> That was for PVH, and is obviously fragile, as there can be MMIO
>>>> regions not matched by any PCI device's BAR. We could hope for all
>>>> of them to be below 4Gb, but I think (based on logs I got to see
>>>> recently from a certain vendor's upcoming systems) this isn't
>>>> going to work out.
>>> 
>>> This is the patch I had in mind that I think will fix these issues.
>>> But I would appreciate testing it and naturally send me the dmesg
>>> if
> possible.
>> 
>> I think there is a simpler way to handle this.
>> 
>> If INVALID_P2M_ENTRY implies 1:1 and we arrange:
> 
> I am a bit afraid to make that assumption.
>> 
>> a) pfn_to_mfn() to return pfn if the mfn is missing in the p2m
> 
> The balloon pages are of missing type (initially). And they should
> return INVALID_P2M_ENTRY at start - later on they will return the scratch_page.
> 
>> b) mfn_to_pfn() to return mfn if p2m(m2p(mfn)) != mfn and there is
>> no m2p override.
> 
> The toolstack can map pages that are are p2m(p2m(mfn)) != mfn and have
> no m2p override.
> 
>> 
>> Then:
>> 
>> a) The identity p2m entries can be removed.
>> b) _PAGE_IOMAP becomes unnecessary.
> 
> You still need it for the toolstack to map other guests pages.
> (xen_privcmd_map).
> 
> I think for right now to fix this issue going ahead and setting
> 1-1 in the P2M for affected devices (PCI and MCFG) is simpler, b/c:
>  - We only do it when said device is in the guest (so if you launch
>    and PCI PV guest you can still migrate it - after unplugging the
>    device). Assuming all 1-1 regions might not be a healthy (I had a
>    heck of time fixing all of the migration issues when I wrote the 1:1
>    code). - It will make PVH hypercall to mark I/O regions easier.
>    Instead of it assuming that all non-RAM space is I/O regions it will
>    be able to selectively setup the entries for said regions. I think
>    that is what Jan suggested?
>  - This is a bug - so lets fix it as a bug first.
> Redoing the P2M is certainly an option but I am not signing up for that this year.
> Let me post my two patches that fix this for PCI devices and MCFG areas.
> 

Any conclusion for this issue? Our custom also saw the same issue that they want to map an MMIO to userspace address(through UIO approach), but current ->mmap function call remap_pfn_range without setting _PAGE_IOMAP and cause the host crashed. It seems all userspace device drivers that tries to map device's mmio will caused host crashed.

They are using 3.10 kernel.

> _______________________________________________
> Xen-devel mailing list
> Xen-devel-GuqFBffKawuEi8DpZVb4nw@public.gmane.org
> http://lists.xen.org/xen-devel


Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-10-25 14:21                                         ` Konrad Rzeszutek Wilk
       [not found]                                           ` <20131025142147.GB3742-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
@ 2013-12-26  6:39                                           ` Zhang, Yang Z
  1 sibling, 0 replies; 40+ messages in thread
From: Zhang, Yang Z @ 2013-12-26  6:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, David Vrabel
  Cc: roland, linux-rdma, xen-devel, Lukas Hejtmanek, Jan Beulich, Liu,
	Jijiang

Konrad Rzeszutek Wilk wrote on 2013-10-25:
> On Fri, Oct 25, 2013 at 12:08:21AM +0100, David Vrabel wrote:
>> On 23/10/13 16:36, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Oct 21, 2013 at 04:12:56PM +0100, Jan Beulich wrote:
>>>>>>> On 21.10.13 at 16:44, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>>>>> On Mon, Oct 21, 2013 at 03:27:50PM +0100, Jan Beulich wrote:
>>>>>>>>> On 21.10.13 at 16:18, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>>>>>>> On Mon, Oct 21, 2013 at 04:06:07PM +0200, Lukas Hejtmanek wrote:
>>>>>>>>         Region 2: Memory at 380fff000000 (64-bit,
>>>>>>>> prefetchable) [size=8M]
>>>>>>> ...
>>>>>>> --- a/arch/x86/xen/setup.c
>>>>>>> +++ b/arch/x86/xen/setup.c
>>>>>>> @@ -92,6 +92,9 @@ static void __init xen_add_extra_mem(u64
>>>>>>> start,
>>>>>>> u64 size)
>>>>>>> 
>>>>>>>                 __set_phys_to_machine(pfn,
> INVALID_P2M_ENTRY);
>>>>>>>         }
>>>>>>> +       /* Anything past the balloon area is marked as identity.
>>>>>>> */ +       for (pfn = xen_max_p2m_pfn; pfn < MAX_DOMAIN_PAGES;
>>>>>>> pfn++) +               __set_phys_to_machine(pfn,
> IDENTITY_FRAME(pfn));
>>>>>> 
>>>>>> Hardly - MAX_DOMAIN_PAGES derives from
>>>>>> CONFIG_XEN_MAX_DOMAIN_MEMORY, which in turn is unrelated to where
>>>>>> MMIO might be. Should you perhaps simply start from
>>>>> 
>>>>> Looks like your mailer ate some words.
>>>> 
>>>> I don't think so - they're all there in the text you quoted.
>>>> 
>>>>>> an all 1:1 mapping, inserting the RAM translations as you find
>>>>>> them?
>>>>> 
>>>>> 
>>>>> Yeah, as this code can be called for the regions under 4GB.
>>>>> Definitly needs more analysis.
>>>>> 
>>>>> Were you suggesting a lookup when we scan the PCI devices?
> (xen_add_device)?
>>>> 
>>>> That was for PVH, and is obviously fragile, as there can be MMIO
>>>> regions not matched by any PCI device's BAR. We could hope for all
>>>> of them to be below 4Gb, but I think (based on logs I got to see
>>>> recently from a certain vendor's upcoming systems) this isn't
>>>> going to work out.
>>> 
>>> This is the patch I had in mind that I think will fix these issues.
>>> But I would appreciate testing it and naturally send me the dmesg
>>> if
> possible.
>> 
>> I think there is a simpler way to handle this.
>> 
>> If INVALID_P2M_ENTRY implies 1:1 and we arrange:
> 
> I am a bit afraid to make that assumption.
>> 
>> a) pfn_to_mfn() to return pfn if the mfn is missing in the p2m
> 
> The balloon pages are of missing type (initially). And they should
> return INVALID_P2M_ENTRY at start - later on they will return the scratch_page.
> 
>> b) mfn_to_pfn() to return mfn if p2m(m2p(mfn)) != mfn and there is
>> no m2p override.
> 
> The toolstack can map pages that are are p2m(p2m(mfn)) != mfn and have
> no m2p override.
> 
>> 
>> Then:
>> 
>> a) The identity p2m entries can be removed.
>> b) _PAGE_IOMAP becomes unnecessary.
> 
> You still need it for the toolstack to map other guests pages.
> (xen_privcmd_map).
> 
> I think for right now to fix this issue going ahead and setting
> 1-1 in the P2M for affected devices (PCI and MCFG) is simpler, b/c:
>  - We only do it when said device is in the guest (so if you launch
>    and PCI PV guest you can still migrate it - after unplugging the
>    device). Assuming all 1-1 regions might not be a healthy (I had a
>    heck of time fixing all of the migration issues when I wrote the 1:1
>    code). - It will make PVH hypercall to mark I/O regions easier.
>    Instead of it assuming that all non-RAM space is I/O regions it will
>    be able to selectively setup the entries for said regions. I think
>    that is what Jan suggested?
>  - This is a bug - so lets fix it as a bug first.
> Redoing the P2M is certainly an option but I am not signing up for that this year.
> Let me post my two patches that fix this for PCI devices and MCFG areas.
> 

Any conclusion for this issue? Our custom also saw the same issue that they want to map an MMIO to userspace address(through UIO approach), but current ->mmap function call remap_pfn_range without setting _PAGE_IOMAP and cause the host crashed. It seems all userspace device drivers that tries to map device's mmio will caused host crashed.

They are using 3.10 kernel.

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


Best regards,
Yang

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xen-devel] BUG: bad page map under Xen
       [not found]                                               ` <A9667DDFB95DB7438FA9D7D576C3D87E0A99CE00-0J0gbvR4kTg/UvCtAeCM4rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2014-01-02 14:18                                                 ` David Vrabel
  0 siblings, 0 replies; 40+ messages in thread
From: David Vrabel @ 2014-01-02 14:18 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Konrad Rzeszutek Wilk, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	xen-devel-GuqFBffKawuEi8DpZVb4nw, Lukas Hejtmanek, Jan Beulich,
	Liu, Jijiang

On 26/12/13 06:39, Zhang, Yang Z wrote:
> 
> Any conclusion for this issue? Our custom also saw the same issue
> that they want to map an MMIO to userspace address(through UIO
> approach), but current ->mmap function call remap_pfn_range without
> setting _PAGE_IOMAP and cause the host crashed. It seems all
> userspace device drivers that tries to map device's mmio will caused
> host crashed.
> 
> They are using 3.10 kernel.

My first attempt at fixing this was broken and I need to find time for
another look.

There were problems with pages that started out pre-ballooned.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: BUG: bad page map under Xen
  2013-12-26  6:39                                             ` Zhang, Yang Z
@ 2014-01-02 14:18                                               ` David Vrabel
       [not found]                                               ` <A9667DDFB95DB7438FA9D7D576C3D87E0A99CE00-0J0gbvR4kTg/UvCtAeCM4rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 0 replies; 40+ messages in thread
From: David Vrabel @ 2014-01-02 14:18 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: roland, linux-rdma, xen-devel, Lukas Hejtmanek, Jan Beulich, Liu,
	Jijiang

On 26/12/13 06:39, Zhang, Yang Z wrote:
> 
> Any conclusion for this issue? Our custom also saw the same issue
> that they want to map an MMIO to userspace address(through UIO
> approach), but current ->mmap function call remap_pfn_range without
> setting _PAGE_IOMAP and cause the host crashed. It seems all
> userspace device drivers that tries to map device's mmio will caused
> host crashed.
> 
> They are using 3.10 kernel.

My first attempt at fixing this was broken and I need to find time for
another look.

There were problems with pages that started out pre-ballooned.

David

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2014-01-02 14:18 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-21 11:57 BUG: bad page map under Xen Lukas Hejtmanek
2013-10-21 12:59 ` konrad wilk
2013-10-21 13:14 ` Jan Beulich
     [not found] ` <20131021115740.GN20913-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
2013-10-21 12:59   ` [Xen-devel] " konrad wilk
     [not found]     ` <52652534.2040303-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-10-21 13:18       ` Jan Beulich
     [not found]         ` <526545E002000078000FC5F1-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
2013-10-21 13:39           ` konrad wilk
     [not found]             ` <52652E95.3020305-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-10-21 13:57               ` konrad wilk
2013-10-21 14:06               ` Lukas Hejtmanek
2013-10-21 14:18                 ` Konrad Rzeszutek Wilk
2013-10-21 14:23                   ` Lukas Hejtmanek
     [not found]                   ` <20131021141855.GA4211-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
2013-10-21 14:23                     ` [Xen-devel] " Lukas Hejtmanek
2013-10-21 14:27                     ` Jan Beulich
     [not found]                       ` <5265560602000078000FC73E-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
2013-10-21 14:44                         ` Konrad Rzeszutek Wilk
2013-10-21 15:12                           ` Jan Beulich
     [not found]                           ` <20131021144407.GC4560-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
2013-10-21 15:12                             ` [Xen-devel] " Jan Beulich
2013-10-23 15:36                               ` Konrad Rzeszutek Wilk
     [not found]                               ` <5265609802000078000FC7B7-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
2013-10-23 15:36                                 ` [Xen-devel] " Konrad Rzeszutek Wilk
2013-10-23 15:45                                   ` Jan Beulich
     [not found]                                   ` <20131023153645.GA28011-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
2013-10-23 15:45                                     ` [Xen-devel] " Jan Beulich
2013-10-23 16:04                                       ` Konrad Rzeszutek Wilk
     [not found]                                       ` <5267FD3102000078000A56A1-ce6RLXgGx+vWGUEhTRrCg1aTQe2KTcn/@public.gmane.org>
2013-10-23 16:04                                         ` [Xen-devel] " Konrad Rzeszutek Wilk
     [not found]                                           ` <20131023160433.GA28260-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
2013-10-23 16:35                                             ` Jan Beulich
2013-10-23 16:35                                           ` Jan Beulich
2013-10-24 23:08                                     ` [Xen-devel] " David Vrabel
     [not found]                                       ` <5269A865.2010100-5LkwijKnu/2sTnJN9+BGXg@public.gmane.org>
2013-10-25 14:21                                         ` Konrad Rzeszutek Wilk
     [not found]                                           ` <20131025142147.GB3742-6K5HmflnPlqSPmnEAIUT9EEOCMrvLtNR@public.gmane.org>
2013-12-26  6:39                                             ` Zhang, Yang Z
2014-01-02 14:18                                               ` David Vrabel
     [not found]                                               ` <A9667DDFB95DB7438FA9D7D576C3D87E0A99CE00-0J0gbvR4kTg/UvCtAeCM4rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-01-02 14:18                                                 ` [Xen-devel] " David Vrabel
2013-12-26  6:39                                           ` Zhang, Yang Z
2013-10-25 14:21                                       ` Konrad Rzeszutek Wilk
2013-10-24 23:08                                   ` David Vrabel
2013-10-21 14:44                       ` Konrad Rzeszutek Wilk
2013-10-21 14:27                   ` Jan Beulich
     [not found]                 ` <20131021140607.GQ20913-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
2013-10-21 14:20                   ` [Xen-devel] " Jan Beulich
2013-10-21 14:20                 ` Jan Beulich
2013-10-21 13:57             ` konrad wilk
2013-10-21 14:06             ` Lukas Hejtmanek
2013-10-21 13:39         ` konrad wilk
2013-10-21 13:18     ` Jan Beulich
2013-10-21 13:14   ` [Xen-devel] " Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.