All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oded Gabbay <oded.gabbay@gmail.com>
To: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Christian König" <christian.koenig@amd.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	iommu@lists.linux-foundation.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFT v2] iommu/amd: use subsys_initcall() on amdv2 iommu
Date: Mon, 18 Apr 2016 09:48:11 +0300	[thread overview]
Message-ID: <CAFCwf12SJ-dTv6PC0_KfHbtC9951xb_4v8wu5uSjXO-V3TgdkA@mail.gmail.com> (raw)
In-Reply-To: <20160412220715.GL1990@wotan.suse.de>

On Wed, Apr 13, 2016 at 1:07 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Mon, Apr 11, 2016 at 03:52:43PM +0200, Christian König wrote:
>> Am 11.04.2016 um 15:39 schrieb Oded Gabbay:
>> >On Mon, Apr 11, 2016 at 4:28 PM, Christian König
>> ><christian.koenig@amd.com> wrote:
>> >>Am 09.04.2016 um 02:25 schrieb Luis R. Rodriguez:
>> >>>On Tue, Mar 29, 2016 at 10:41 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> >>>>We need to ensure amd iommu v2 initializes before
>> >>>>driver uses such as drivers/gpu/drm/amd/amdkfd/kfd_module.c,
>> >>>>to do this make its init routine a subsys_initcall() which
>> >>>>ensures its load init is called first than modules when
>> >>>>built-in.
>> >>>>
>> >>>>This reverts the old work around implemented through commit
>> >>>>1bacc894c227fad8a7 ("drivers: Move iommu/ before gpu/ in Makefile"),
>> >>>>instead of making the dependency implicit by linker order this
>> >>>>makes the ordering requirement explicit through proper kernel
>> >>>>APIs.
>> >>>>
>> >>>>Cc: Oded Gabbay <oded.gabbay@amd.com>
>> >>>>Cc: Christian König <christian.koenig@amd.com>
>> >>>>Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
>> >>
>> >>Sorry for not responding earlier. Just coming back to all the stuff on my TODO list.
>> >>
>> >>Patch is Acked-by: Christian König <christian.koenig@amd.com>
>> >
>> >Christian,
>> >Just wanted to be sure if you tested this patch-set or not.
>>
>> I did NOT tested it. If AMD IOMMU requires something which will now
>> initialize after the IOMMU module we will obviously run into trouble
>> again.
>>
>> I assumed that the creator of the patch did some testing.
>
> Nope, hence [RTF] Request For Testing.
>
>> >I don't think it should be merged without testing. If you already
>> >tested it than fine. If not, I think I can do it in the next week or
>> >so (just came back from PTO).
>>
>> Yeah, agree totally.
>
> Agreed, please let me know if someone is able to test and confirm
> this works. It should work.
>
>   Luis

Hi,
So I finally got to test this patch and it's not working.
The reason is that AMD IOMMUv2 gets initialized *before* AMD IOMMUv1 driver !
So, we get this from dmesg:

[    0.439612] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    0.439614] AMD IOMMUv2 functionality not available on this system

later we get:
[    1.084749] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    1.084750] AMD-Vi:  Extended features:  PPR GT IA PC
[    1.084754] AMD-Vi: Interrupt remapping enabled
[    1.085125] AMD-Vi: Lazy IO/TLB flushing enabled


And when I run some HSA samples, at the tear-down process stage we get this:

[ 7937.740306] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000020
[ 7937.740932] IP: [<ffffffff819661c2>] mutex_lock+0x12/0x30
[ 7937.741534] PGD 55a57067 PUD 55a56067 PMD 0
[ 7937.742127] Oops: 0002 [#1] SMP
[ 7937.742709] Modules linked in: xt_CHECKSUM iptable_mangle
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge
stp llc ebtable_filter ebtables ip6table_filter ip6_tables
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi
snd_hda_intel snd_hda_codec kvm_amd kvm snd_hda_core snd_hwdep snd_seq
edac_mce_amd edac_core snd_seq_device irqbypass crct10dif_pclmul
crc32_pclmul ppdev pcspkr crc32c_intel snd_pcm fuse
ghash_clmulni_intel sp5100_tco acpi_cpufreq parport_pc i2c_piix4
snd_timer fam15h_power k10temp video tpm_infineon tpm_tis parport
shpchp snd tpm soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc
serio_raw uas usb_storage r8169 mii fjes
[ 7937.746867] CPU: 2 PID: 2260 Comm: kfdtest Not tainted 4.6.0-rc3 #5
[ 7937.747600] Hardware name: Gigabyte Technology Co., Ltd. To be
filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014
[ 7937.748363] task: ffff88006944d880 ti: ffff88005b734000 task.ti:
ffff88005b734000
[ 7937.749136] RIP: 0010:[<ffffffff819661c2>]  [<ffffffff819661c2>]
mutex_lock+0x12/0x30
[ 7937.749918] RSP: 0018:ffff88005b737cc8  EFLAGS: 00010246
[ 7937.750695] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
[ 7937.751491] RDX: 0000000080000000 RSI: ffff880000000000 RDI: 0000000000000020
[ 7937.752286] RBP: ffff88005b737cd0 R08: 0000000000000002 R09: ffffffff815321e2
[ 7937.753084] R10: ffffea00016f71c0 R11: 0000000000000246 R12: ffff880035239680
[ 7937.753884] R13: 0000000000000000 R14: 0000000000000001 R15: ffff88005b737d10
[ 7937.754684] FS:  00007fe0bc68a740(0000) GS:ffff88006d500000(0000)
knlGS:0000000000000000
[ 7937.755504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7937.756328] CR2: 0000000000000020 CR3: 0000000057198000 CR4: 00000000000406e0
[ 7937.757168] Stack:
[ 7937.758004]  0000000000000000 ffff88005b737d80 ffffffff810bae2c
ffff88005b737d48
[ 7937.758872]  0000000000000020 ffff880000000000 0000000000000000
ffff88005b737d38
[ 7937.759737]  ffff88005b737d38 ffff88005b737d10 ffff88005b737d10
ffff8800ffffffff
[ 7937.760606] Call Trace:
[ 7937.761474]  [<ffffffff810bae2c>] flush_workqueue+0x9c/0x530
[ 7937.762348]  [<ffffffff818107a4>] ? amd_iommu_domain_clear_gcr3+0x84/0xa0
[ 7937.763225]  [<ffffffff818157d0>] mn_release+0x60/0x70
[ 7937.764107]  [<ffffffff81210c64>] __mmu_notifier_release+0x44/0xc0
[ 7937.764998]  [<ffffffff811efbca>] exit_mmap+0x15a/0x170
[ 7937.765889]  [<ffffffff811e8b33>] ? handle_mm_fault+0x14e3/0x1d50
[ 7937.766784]  [<ffffffff814b06d0>] ? n_tty_open+0xd0/0xd0
[ 7937.767677]  [<ffffffff81124cac>] ? exit_robust_list+0x5c/0x110
[ 7937.768573]  [<ffffffff810a199b>] mmput+0x5b/0x100
[ 7937.769463]  [<ffffffff810a8566>] do_exit+0x276/0xb30
[ 7937.770349]  [<ffffffff810684a5>] ? __do_page_fault+0x205/0x4d0
[ 7937.771226]  [<ffffffff810a8ea7>] do_group_exit+0x47/0xb0
[ 7937.772090]  [<ffffffff810a8f24>] SyS_exit_group+0x14/0x20
[ 7937.772941]  [<ffffffff819684f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 7937.773801] Code: 83 f8 01 0f 85 6d ff ff ff eb db e8 f9 e4 73 ff
66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8
1e e4 ff ff <f0> ff 0b 79 08 48 89 df e8 c1 fe ff ff 65 48 8b 04 25 c0
d2 00
[ 7937.775671] RIP  [<ffffffff819661c2>] mutex_lock+0x12/0x30
[ 7937.776588]  RSP <ffff88005b737cc8>
[ 7937.777499] CR2: 0000000000000020
[ 7937.781746] ---[ end trace 46aeb31f738c07ff ]---
[ 7937.781748] Fixing recursive fault but reboot is needed!

WARNING: multiple messages have this Message-ID (diff)
From: Oded Gabbay <oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: "Luis R. Rodriguez" <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: "Christian König" <christian.koenig-5C7GfCeVMHo@public.gmane.org>,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [RFT v2] iommu/amd: use subsys_initcall() on amdv2 iommu
Date: Mon, 18 Apr 2016 09:48:11 +0300	[thread overview]
Message-ID: <CAFCwf12SJ-dTv6PC0_KfHbtC9951xb_4v8wu5uSjXO-V3TgdkA@mail.gmail.com> (raw)
In-Reply-To: <20160412220715.GL1990-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>

On Wed, Apr 13, 2016 at 1:07 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Mon, Apr 11, 2016 at 03:52:43PM +0200, Christian König wrote:
>> Am 11.04.2016 um 15:39 schrieb Oded Gabbay:
>> >On Mon, Apr 11, 2016 at 4:28 PM, Christian König
>> ><christian.koenig@amd.com> wrote:
>> >>Am 09.04.2016 um 02:25 schrieb Luis R. Rodriguez:
>> >>>On Tue, Mar 29, 2016 at 10:41 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> >>>>We need to ensure amd iommu v2 initializes before
>> >>>>driver uses such as drivers/gpu/drm/amd/amdkfd/kfd_module.c,
>> >>>>to do this make its init routine a subsys_initcall() which
>> >>>>ensures its load init is called first than modules when
>> >>>>built-in.
>> >>>>
>> >>>>This reverts the old work around implemented through commit
>> >>>>1bacc894c227fad8a7 ("drivers: Move iommu/ before gpu/ in Makefile"),
>> >>>>instead of making the dependency implicit by linker order this
>> >>>>makes the ordering requirement explicit through proper kernel
>> >>>>APIs.
>> >>>>
>> >>>>Cc: Oded Gabbay <oded.gabbay@amd.com>
>> >>>>Cc: Christian König <christian.koenig@amd.com>
>> >>>>Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
>> >>
>> >>Sorry for not responding earlier. Just coming back to all the stuff on my TODO list.
>> >>
>> >>Patch is Acked-by: Christian König <christian.koenig@amd.com>
>> >
>> >Christian,
>> >Just wanted to be sure if you tested this patch-set or not.
>>
>> I did NOT tested it. If AMD IOMMU requires something which will now
>> initialize after the IOMMU module we will obviously run into trouble
>> again.
>>
>> I assumed that the creator of the patch did some testing.
>
> Nope, hence [RTF] Request For Testing.
>
>> >I don't think it should be merged without testing. If you already
>> >tested it than fine. If not, I think I can do it in the next week or
>> >so (just came back from PTO).
>>
>> Yeah, agree totally.
>
> Agreed, please let me know if someone is able to test and confirm
> this works. It should work.
>
>   Luis

Hi,
So I finally got to test this patch and it's not working.
The reason is that AMD IOMMUv2 gets initialized *before* AMD IOMMUv1 driver !
So, we get this from dmesg:

[    0.439612] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    0.439614] AMD IOMMUv2 functionality not available on this system

later we get:
[    1.084749] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    1.084750] AMD-Vi:  Extended features:  PPR GT IA PC
[    1.084754] AMD-Vi: Interrupt remapping enabled
[    1.085125] AMD-Vi: Lazy IO/TLB flushing enabled


And when I run some HSA samples, at the tear-down process stage we get this:

[ 7937.740306] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000020
[ 7937.740932] IP: [<ffffffff819661c2>] mutex_lock+0x12/0x30
[ 7937.741534] PGD 55a57067 PUD 55a56067 PMD 0
[ 7937.742127] Oops: 0002 [#1] SMP
[ 7937.742709] Modules linked in: xt_CHECKSUM iptable_mangle
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge
stp llc ebtable_filter ebtables ip6table_filter ip6_tables
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi
snd_hda_intel snd_hda_codec kvm_amd kvm snd_hda_core snd_hwdep snd_seq
edac_mce_amd edac_core snd_seq_device irqbypass crct10dif_pclmul
crc32_pclmul ppdev pcspkr crc32c_intel snd_pcm fuse
ghash_clmulni_intel sp5100_tco acpi_cpufreq parport_pc i2c_piix4
snd_timer fam15h_power k10temp video tpm_infineon tpm_tis parport
shpchp snd tpm soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc
serio_raw uas usb_storage r8169 mii fjes
[ 7937.746867] CPU: 2 PID: 2260 Comm: kfdtest Not tainted 4.6.0-rc3 #5
[ 7937.747600] Hardware name: Gigabyte Technology Co., Ltd. To be
filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014
[ 7937.748363] task: ffff88006944d880 ti: ffff88005b734000 task.ti:
ffff88005b734000
[ 7937.749136] RIP: 0010:[<ffffffff819661c2>]  [<ffffffff819661c2>]
mutex_lock+0x12/0x30
[ 7937.749918] RSP: 0018:ffff88005b737cc8  EFLAGS: 00010246
[ 7937.750695] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
[ 7937.751491] RDX: 0000000080000000 RSI: ffff880000000000 RDI: 0000000000000020
[ 7937.752286] RBP: ffff88005b737cd0 R08: 0000000000000002 R09: ffffffff815321e2
[ 7937.753084] R10: ffffea00016f71c0 R11: 0000000000000246 R12: ffff880035239680
[ 7937.753884] R13: 0000000000000000 R14: 0000000000000001 R15: ffff88005b737d10
[ 7937.754684] FS:  00007fe0bc68a740(0000) GS:ffff88006d500000(0000)
knlGS:0000000000000000
[ 7937.755504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7937.756328] CR2: 0000000000000020 CR3: 0000000057198000 CR4: 00000000000406e0
[ 7937.757168] Stack:
[ 7937.758004]  0000000000000000 ffff88005b737d80 ffffffff810bae2c
ffff88005b737d48
[ 7937.758872]  0000000000000020 ffff880000000000 0000000000000000
ffff88005b737d38
[ 7937.759737]  ffff88005b737d38 ffff88005b737d10 ffff88005b737d10
ffff8800ffffffff
[ 7937.760606] Call Trace:
[ 7937.761474]  [<ffffffff810bae2c>] flush_workqueue+0x9c/0x530
[ 7937.762348]  [<ffffffff818107a4>] ? amd_iommu_domain_clear_gcr3+0x84/0xa0
[ 7937.763225]  [<ffffffff818157d0>] mn_release+0x60/0x70
[ 7937.764107]  [<ffffffff81210c64>] __mmu_notifier_release+0x44/0xc0
[ 7937.764998]  [<ffffffff811efbca>] exit_mmap+0x15a/0x170
[ 7937.765889]  [<ffffffff811e8b33>] ? handle_mm_fault+0x14e3/0x1d50
[ 7937.766784]  [<ffffffff814b06d0>] ? n_tty_open+0xd0/0xd0
[ 7937.767677]  [<ffffffff81124cac>] ? exit_robust_list+0x5c/0x110
[ 7937.768573]  [<ffffffff810a199b>] mmput+0x5b/0x100
[ 7937.769463]  [<ffffffff810a8566>] do_exit+0x276/0xb30
[ 7937.770349]  [<ffffffff810684a5>] ? __do_page_fault+0x205/0x4d0
[ 7937.771226]  [<ffffffff810a8ea7>] do_group_exit+0x47/0xb0
[ 7937.772090]  [<ffffffff810a8f24>] SyS_exit_group+0x14/0x20
[ 7937.772941]  [<ffffffff819684f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 7937.773801] Code: 83 f8 01 0f 85 6d ff ff ff eb db e8 f9 e4 73 ff
66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8
1e e4 ff ff <f0> ff 0b 79 08 48 89 df e8 c1 fe ff ff 65 48 8b 04 25 c0
d2 00
[ 7937.775671] RIP  [<ffffffff819661c2>] mutex_lock+0x12/0x30
[ 7937.776588]  RSP <ffff88005b737cc8>
[ 7937.777499] CR2: 0000000000000020
[ 7937.781746] ---[ end trace 46aeb31f738c07ff ]---
[ 7937.781748] Fixing recursive fault but reboot is needed!
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2016-04-18  6:48 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-15 22:12 [RFT] iommu/amd: use subsys_initcall() on amdv2 iommu Luis R. Rodriguez
2016-03-16  7:02 ` Oded Gabbay
2016-03-16  7:02   ` Oded Gabbay
2016-03-16 10:14   ` Joerg Roedel
2016-03-16 10:14     ` Joerg Roedel
2016-03-16 10:16     ` Oded Gabbay
2016-03-16 16:17       ` Luis R. Rodriguez
2016-03-16 16:39         ` Joerg Roedel
2016-03-16 16:57           ` Luis R. Rodriguez
2016-03-16 17:17             ` Joerg Roedel
2016-03-16 17:17               ` Joerg Roedel
2016-03-29 17:41               ` [RFT v2] " Luis R. Rodriguez
2016-03-29 17:41                 ` Luis R. Rodriguez
2016-04-09  0:25                 ` Luis R. Rodriguez
2016-04-09  0:25                   ` Luis R. Rodriguez
2016-04-11 13:28                   ` Christian König
2016-04-11 13:28                     ` Christian König
2016-04-11 13:39                     ` Oded Gabbay
2016-04-11 13:39                       ` Oded Gabbay
2016-04-11 13:52                       ` Christian König
2016-04-11 13:52                         ` Christian König
2016-04-12 22:07                         ` Luis R. Rodriguez
2016-04-12 22:07                           ` Luis R. Rodriguez
2016-04-18  6:48                           ` Oded Gabbay [this message]
2016-04-18  6:48                             ` Oded Gabbay
     [not found]                             ` <CAFCwf12SJ-dTv6PC0_KfHbtC9951xb_4v8wu5uSjXO-V3TgdkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-18  6:55                               ` Luis R. Rodriguez
2016-04-18  7:02                                 ` Oded Gabbay
2016-04-18  7:02                                   ` Oded Gabbay
2016-04-18 12:03                                   ` Luis R. Rodriguez
2016-04-18 12:03                                     ` Luis R. Rodriguez
2016-04-19  2:02                                     ` Wan Zongshun
2016-04-19  2:02                                       ` Wan Zongshun
2016-05-27  0:12                                       ` Luis R. Rodriguez
2016-05-27  0:12                                         ` Luis R. Rodriguez
2016-04-25 10:23                                     ` Joerg Roedel
2016-04-25 10:23                                       ` Joerg Roedel
2016-05-27  0:46                                       ` Luis R. Rodriguez
2016-05-27  1:18                                         ` [RFT v3] drm: use late_initcall() for amdkfd and radeon Luis R. Rodriguez
2016-05-29 14:49                                           ` Oded Gabbay
2016-05-29 14:49                                             ` Oded Gabbay
2016-05-31 17:15                                             ` Luis R. Rodriguez
2016-05-31 17:15                                               ` Luis R. Rodriguez
2016-05-31 17:33                                               ` Oded Gabbay
2016-05-31 17:33                                                 ` Oded Gabbay
2016-05-29 18:27                                           ` Daniel Vetter
2016-05-29 18:27                                             ` Daniel Vetter
2016-05-31 16:58                                             ` Luis R. Rodriguez
2016-05-31 19:04                                               ` Daniel Vetter
2016-05-31 19:04                                                 ` Daniel Vetter
2016-06-01 21:11                                                 ` Luis R. Rodriguez
2016-11-10 22:12                                                   ` Luis R. Rodriguez
2016-11-10 22:12                                                     ` Luis R. Rodriguez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFCwf12SJ-dTv6PC0_KfHbtC9951xb_4v8wu5uSjXO-V3TgdkA@mail.gmail.com \
    --to=oded.gabbay@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.