bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Christoph Hellwig <hch@infradead.org>,
	"rick.p.edgecombe@intel.com" <rick.p.edgecombe@intel.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Song Liu <song@kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	bpf <bpf@vger.kernel.org>, Networking <netdev@vger.kernel.org>,
	X86 ML <x86@kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"andrii@kernel.org" <andrii@kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"pmenzel@molgen.mpg.de" <pmenzel@molgen.mpg.de>
Subject: Re: [PATCH bpf 0/4] introduce HAVE_ARCH_HUGE_VMALLOC_FLAG for bpf_prog_pack
Date: Fri, 1 Apr 2022 22:22:00 +0000	[thread overview]
Message-ID: <6AA91984-7DF3-4820-91DF-DD6CA251B638@fb.com> (raw)
In-Reply-To: <F3447905-8D42-46C0-B324-988A0E4E52E7@fb.com>

+ Nicholas and Claudio,


> On Mar 31, 2022, at 4:59 PM, Song Liu <songliubraving@fb.com> wrote:
> 
> Hi Christoph, 
> 
>> On Mar 30, 2022, at 10:37 PM, Christoph Hellwig <hch@infradead.org> wrote:
>> 
>> On Wed, Mar 30, 2022 at 03:56:38PM -0700, Song Liu wrote:
>>> We prematurely enabled HAVE_ARCH_HUGE_VMALLOC for x86_64, which could cause
>>> issues [1], [2].
>>> 
>> 
>> Please fix the underlying issues instead of papering over them and
>> creating a huge maintainance burden for others.

After reading the code a little more, I wonder what would be best strategy. 
IIUC, most of the kernel is not ready for huge page backed vmalloc memory.
For example, all the module_alloc cannot work with huge pages at the moment.
And the error Paul Menzel reported in drm_fb_helper.c will probably hit 
powerpc with 5.17 kernel as-is? (trace attached below) 

Right now, we have VM_NO_HUGE_VMAP to let a user to opt out of huge pages. 
However, given there are so many users of vmalloc, vzalloc, etc., we 
probably do need a flag for the user to opt-in? 

Does this make sense? Any recommendations are really appreciated. 

Thanks,
Song 




[    1.687983] BUG: Bad page state in process systemd-udevd  pfn:102e03
[    1.687992] fbcon: Taking over console
[    1.688007] page:(____ptrval____) refcount:0 mapcount:0 mapping:0000000000000000 index:0x3 pfn:0x102e03
[    1.688011] head:(____ptrval____) order:9 compound_mapcount:0 compound_pincount:0
[    1.688013] flags: 0x2fffc000010000(head|node=0|zone=2|lastcpupid=0x3fff)
[    1.688018] raw: 002fffc000000000 ffffe815040b8001 ffffe815040b80c8 0000000000000000
[    1.688020] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[    1.688022] head: 002fffc000010000 0000000000000000 dead000000000122 0000000000000000
[    1.688023] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[    1.688024] page dumped because: corrupted mapping in tail page
[    1.688025] Modules linked in: r8169(+) k10temp snd_pcm(+) xhci_hcd snd_timer realtek ohci_hcd ehci_pci(+) i2c_piix4 ehci_hcd radeon(+) snd sg drm_ttm_helper ttm soundcore coreboot_table acpi_cpufreq fuse ipv6 autofs4
[    1.688045] CPU: 1 PID: 151 Comm: systemd-udevd Not tainted 5.16.0-11615-gfac54e2bfb5b #319
[    1.688048] Hardware name: ASUS F2A85-M_PRO/F2A85-M_PRO, BIOS 4.16-337-gb87986e67b 03/25/2022
[    1.688050] Call Trace:
[    1.688051]  <TASK>
[    1.688053]  dump_stack_lvl+0x34/0x44
[    1.688059]  bad_page.cold+0x63/0x94
[    1.688063]  free_tail_pages_check+0xd1/0x110
[    1.688067]  ? _raw_spin_lock+0x13/0x30
[    1.688071]  free_pcp_prepare+0x251/0x2e0
[    1.688075]  free_unref_page+0x1d/0x110
[    1.688078]  __vunmap+0x28a/0x380
[    1.688082]  drm_fbdev_cleanup+0x5f/0xb0
[    1.688085]  drm_fbdev_fb_destroy+0x15/0x30
[    1.688087]  unregister_framebuffer+0x1d/0x30
[    1.688091]  drm_client_dev_unregister+0x69/0xe0
[    1.688095]  drm_dev_unregister+0x2e/0x80
[    1.688098]  drm_dev_unplug+0x21/0x40
[    1.688100]  simpledrm_remove+0x11/0x20
[    1.688103]  platform_remove+0x1f/0x40
[    1.688106]  __device_release_driver+0x17a/0x240
[    1.688109]  device_release_driver+0x24/0x30
[    1.688112]  bus_remove_device+0xd8/0x140
[    1.688115]  device_del+0x18b/0x3f0
[    1.688118]  ? _raw_spin_unlock_irqrestore+0x1b/0x30
[    1.688121]  ? try_to_wake_up+0x94/0x5b0
[    1.688124]  platform_device_del.part.0+0x13/0x70
[    1.688127]  platform_device_unregister+0x1c/0x30
[    1.688130]  drm_aperture_detach_drivers+0xa1/0xd0
[    1.688134]  drm_aperture_remove_conflicting_pci_framebuffers+0x3f/0x60
[    1.688137]  radeon_pci_probe+0x54/0xf0 [radeon]
[    1.688212]  local_pci_probe+0x45/0x80
[    1.688216]  ? pci_match_device+0xd7/0x130
[    1.688219]  pci_device_probe+0xc2/0x1d0
[    1.688223]  really_probe+0x1f5/0x3d0
[    1.688226]  __driver_probe_device+0xfe/0x180
[    1.688229]  driver_probe_device+0x1e/0x90
[    1.688232]  __driver_attach+0xc0/0x1c0
[    1.688235]  ? __device_attach_driver+0xe0/0xe0
[    1.688237]  ? __device_attach_driver+0xe0/0xe0
[    1.688239]  bus_for_each_dev+0x78/0xc0
[    1.688242]  bus_add_driver+0x149/0x1e0
[    1.688245]  driver_register+0x8f/0xe0
[    1.688248]  ? 0xffffffffc051d000
[    1.688250]  do_one_initcall+0x44/0x200
[    1.688254]  ? kmem_cache_alloc_trace+0x170/0x2c0
[    1.688257]  do_init_module+0x5c/0x260
[    1.688262]  __do_sys_finit_module+0xb4/0x120
[    1.688266]  __do_fast_syscall_32+0x6b/0xe0
[    1.688270]  do_fast_syscall_32+0x2f/0x70
[    1.688272]  entry_SYSCALL_compat_after_hwframe+0x45/0x4d
[    1.688275] RIP: 0023:0xf7e51549
[    1.688278] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 cd 0f 05 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
[    1.688281] RSP: 002b:00000000ffa1666c EFLAGS: 00200292 ORIG_RAX: 000000000000015e
[    1.688285] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00000000f7e30e09
[    1.688287] RDX: 0000000000000000 RSI: 00000000f9a705d0 RDI: 00000000f9a6f6a0
[    1.688288] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[    1.688290] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    1.688291] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    1.688294]  </TASK>
[    1.688355] Disabling lock debugging due to kernel taint

  reply	other threads:[~2022-04-01 22:22 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220330225642.1163897-1-song@kernel.org>
     [not found] ` <20220330225642.1163897-3-song@kernel.org>
2022-03-30 23:40   ` [PATCH bpf 2/4] vmalloc: introduce HAVE_ARCH_HUGE_VMALLOC_FLAG Edgecombe, Rick P
2022-03-31  0:26     ` Song Liu
     [not found] ` <20220330225642.1163897-2-song@kernel.org>
2022-03-30 23:47   ` [PATCH bpf 1/4] x86: disable HAVE_ARCH_HUGE_VMALLOC Thomas Gleixner
     [not found] ` <20220330225642.1163897-4-song@kernel.org>
2022-03-30 23:54   ` [PATCH bpf 3/4] x86: select HAVE_ARCH_HUGE_VMALLOC_FLAG for X86_64 Thomas Gleixner
2022-03-31  0:30     ` Song Liu
     [not found] ` <20220330225642.1163897-5-song@kernel.org>
2022-03-31  0:00   ` [PATCH bpf 4/4] bpf: use __vmalloc_node_range() with VM_TRY_HUGE_VMAP for bpf_prog_pack Thomas Gleixner
2022-03-31  0:31     ` Song Liu
2022-03-31  0:04 ` [PATCH bpf 0/4] introduce HAVE_ARCH_HUGE_VMALLOC_FLAG " Edgecombe, Rick P
2022-03-31  0:46   ` Song Liu
2022-03-31 16:19     ` Edgecombe, Rick P
2022-03-31  5:37 ` Christoph Hellwig
2022-03-31 23:59   ` Song Liu
2022-04-01 22:22     ` Song Liu [this message]
2022-04-05  7:07       ` Christoph Hellwig
2022-04-05 23:54         ` Song Liu
2022-04-07 19:57           ` Song Liu
2022-04-08 10:08             ` Claudio Imbrenda
2022-04-08 21:22               ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6AA91984-7DF3-4820-91DF-DD6CA251B638@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hch@infradead.org \
    --cc=imbrenda@linux.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=rick.p.edgecombe@intel.com \
    --cc=song@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).