All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Joerg Roedel <jroedel@suse.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shile Zhang <shile.zhang@linux.alibaba.com>,
	Andy Lutomirski <luto@amacapital.net>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Subject: Re: [PATCH] percpu: Sync vmalloc mappings in pcpu_alloc() and free_percpu()
Date: Mon, 4 May 2020 13:40:42 -0400	[thread overview]
Message-ID: <20200504134042.178409c3@gandalf.local.home> (raw)
In-Reply-To: <20200504151236.GI8135@suse.de>

On Mon, 4 May 2020 17:12:36 +0200
Joerg Roedel <jroedel@suse.de> wrote:

> On Thu, Apr 30, 2020 at 10:39:19PM -0400, Steven Rostedt wrote:
> > What's so damn special about alloc_percpu()? It's definitely not a fast
> > path. And it's not used often.  
> 
> Okay, I fixed it in the percpu code. It is definitly not a nice
> solution, but having to call vmalloc_sync_mappings/unmappings() is not a
> nice solution at any place in the code. Here is the patch which fixes
> this issue for me. I am also not sure what to put in the Fixes tag, as
> it is related to tracing code accessing per-cpu data from the page-fault
> handler, not sure when this got introduced. Maybe someone else can
> provide a meaningful Fixes- or stable tag.
> 
> I also have an idea in mind how to make this all more robust and get rid
> of the vmalloc_sync_mappings/unmappings() interface, will show more when
> I know it works the way I think it does.
> 
>

Seems that your patch caused a lockdep splat on my box:

 ========================================================
 WARNING: possible irq lock inversion dependency detected
 5.7.0-rc3-test+ #249 Not tainted
 --------------------------------------------------------
 swapper/4/0 just changed the state of lock:
 ffff9a580fdd75a0 (&ndev->lock){++.-}-{2:2}, at: mld_ifc_timer_expire+0x3c/0x350
 but this lock took another, SOFTIRQ-unsafe lock in the past:
  (pgd_lock){+.+.}-{2:2}
 
 
 and interrupts could create inverse lock ordering between them.
 
 
 other info that might help us debug this:
  Possible interrupt unsafe locking scenario:
 
        CPU0                    CPU1
        ----                    ----
   lock(pgd_lock);
                                local_irq_disable();
                                lock(&ndev->lock);
                                lock(pgd_lock);
   <Interrupt>
     lock(&ndev->lock);
 
  *** DEADLOCK ***
 
 1 lock held by swapper/4/0:
  #0: ffff9a581ab05e70 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x2f0
 
 the shortest dependencies between 2nd lock and 1st lock:
  -> (pgd_lock){+.+.}-{2:2} {
     HARDIRQ-ON-W at:
                       lock_acquire+0xda/0x3d0
                       _raw_spin_lock+0x2f/0x40
                       sync_global_pgds_l4+0x77/0x180
                       pcpu_alloc+0x1fd/0x7b0
                       __kmem_cache_create+0x358/0x540
                       create_cache+0xe1/0x1f0
                       kmem_cache_create_usercopy+0x1a5/0x270
                       kmem_cache_create+0x12/0x20
                       acpi_os_create_cache+0x18/0x30
                       acpi_ut_create_caches+0x47/0xab
                       acpi_ut_init_globals+0xa/0x21a
                       acpi_initialize_subsystem+0x30/0xa5
                       acpi_early_init+0x62/0xd6
                       start_kernel+0x797/0x86a
                       secondary_startup_64+0xa4/0xb0
     SOFTIRQ-ON-W at:
                       lock_acquire+0xda/0x3d0
                       _raw_spin_lock+0x2f/0x40
                       sync_global_pgds_l4+0x77/0x180
                       pcpu_alloc+0x1fd/0x7b0
                       __kmem_cache_create+0x358/0x540
                       create_cache+0xe1/0x1f0
                       kmem_cache_create_usercopy+0x1a5/0x270
                       kmem_cache_create+0x12/0x20
                       acpi_os_create_cache+0x18/0x30
                       acpi_ut_create_caches+0x47/0xab
                       acpi_ut_init_globals+0xa/0x21a
                       acpi_initialize_subsystem+0x30/0xa5
                       acpi_early_init+0x62/0xd6
                       start_kernel+0x797/0x86a
                       secondary_startup_64+0xa4/0xb0
     INITIAL USE at:
   }
   ... key      at: [<ffffffffb96340b8>] pgd_lock+0x18/0x40
   ... acquired at:
    _raw_spin_lock+0x2f/0x40
    sync_global_pgds_l4+0x77/0x180
    pcpu_alloc+0x1fd/0x7b0
    fib_nh_common_init+0x53/0x110
    fib6_nh_init+0x10c/0x700
    ip6_route_info_create+0x344/0x440
    ip6_route_add+0x18/0x90
    addrconf_prefix_route.isra.48+0x17b/0x210
    addrconf_notify+0x743/0x8c0
    notifier_call_chain+0x47/0x70
    __dev_notify_flags+0x9d/0x150
    dev_change_flags+0x48/0x60
    do_setlink+0x39d/0x1080
    rtnl_setlink+0x116/0x190
    rtnetlink_rcv_msg+0x188/0x4b0
    netlink_rcv_skb+0x75/0x140
    netlink_unicast+0x1ae/0x280
    netlink_sendmsg+0x253/0x490
    sock_sendmsg+0x5b/0x60
    __sys_sendto+0x12c/0x190
    __x64_sys_sendto+0x24/0x30
    do_syscall_64+0x60/0x230
    entry_SYSCALL_64_after_hwframe+0x49/0xb3
 
 -> (&ndev->lock){++.-}-{2:2} {
    HARDIRQ-ON-W at:
                     lock_acquire+0xda/0x3d0
                     _raw_write_lock_bh+0x34/0x40
                     ipv6_mc_init_dev+0x19/0xc0
                     ipv6_add_dev+0x2e5/0x490
                     addrconf_init+0x7f/0x250
                     inet6_init+0x1c3/0x373
                     do_one_initcall+0x70/0x340
                     kernel_init_freeable+0x249/0x2ca
                     kernel_init+0xa/0x10a
                     ret_from_fork+0x3a/0x50
    HARDIRQ-ON-R at:
                     lock_acquire+0xda/0x3d0
                     _raw_read_lock_bh+0x37/0x50
                     addrconf_dad_work+0xc6/0x560
                     process_one_work+0x25e/0x5c0
                     worker_thread+0x30/0x380
                     kthread+0x139/0x160
                     ret_from_fork+0x3a/0x50
    IN-SOFTIRQ-R at:
                     lock_acquire+0xda/0x3d0
                     _raw_read_lock_bh+0x37/0x50
                     mld_ifc_timer_expire+0x3c/0x350
                     call_timer_fn+0xa5/0x2f0
                     run_timer_softirq+0x1dd/0x580
                     __do_softirq+0xf8/0x4be
                     irq_exit+0xf1/0x100
                     smp_apic_timer_interrupt+0xd0/0x2a0
                     apic_timer_interrupt+0xf/0x20
                     cpuidle_enter_state+0xcd/0x440
                     cpuidle_enter+0x29/0x40
                     do_idle+0x24a/0x290
                     cpu_startup_entry+0x19/0x20
                     start_secondary+0x195/0x1e0
                     secondary_startup_64+0xa4/0xb0
    INITIAL USE at:
                    lock_acquire+0xda/0x3d0
                    _raw_write_lock_bh+0x34/0x40
                    ipv6_mc_init_dev+0x19/0xc0
                    ipv6_add_dev+0x2e5/0x490
                    addrconf_init+0x7f/0x250
                    inet6_init+0x1c3/0x373
                    do_one_initcall+0x70/0x340
                    kernel_init_freeable+0x249/0x2ca
                    kernel_init+0xa/0x10a
                    ret_from_fork+0x3a/0x50
  }
  ... key      at: [<ffffffffbaf727f0>] __key.78650+0x0/0x10
  ... acquired at:
    mark_lock+0x22e/0x740
    __lock_acquire+0x9e1/0x1c30
    lock_acquire+0xda/0x3d0
    _raw_read_lock_bh+0x37/0x50
    mld_ifc_timer_expire+0x3c/0x350
    call_timer_fn+0xa5/0x2f0
    run_timer_softirq+0x1dd/0x580
    __do_softirq+0xf8/0x4be
    irq_exit+0xf1/0x100
    smp_apic_timer_interrupt+0xd0/0x2a0
    apic_timer_interrupt+0xf/0x20
    cpuidle_enter_state+0xcd/0x440
    cpuidle_enter+0x29/0x40
    do_idle+0x24a/0x290
    cpu_startup_entry+0x19/0x20
    start_secondary+0x195/0x1e0
    secondary_startup_64+0xa4/0xb0
 
 
 stack backtrace:
 CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.7.0-rc3-test+ #249
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 Call Trace:
  <IRQ>
  dump_stack+0x8f/0xd0
  check_usage_forwards.cold.61+0x1e/0x27
  mark_lock+0x22e/0x740
  ? check_usage_backwards+0x1e0/0x1e0
  __lock_acquire+0x9e1/0x1c30
  lock_acquire+0xda/0x3d0
  ? mld_ifc_timer_expire+0x3c/0x350
  ? mld_dad_timer_expire+0xb0/0xb0
  ? mld_dad_timer_expire+0xb0/0xb0
  _raw_read_lock_bh+0x37/0x50
  ? mld_ifc_timer_expire+0x3c/0x350
  mld_ifc_timer_expire+0x3c/0x350
  ? mld_dad_timer_expire+0xb0/0xb0
  ? mld_dad_timer_expire+0xb0/0xb0
  call_timer_fn+0xa5/0x2f0
  ? mld_dad_timer_expire+0xb0/0xb0
  run_timer_softirq+0x1dd/0x580
  __do_softirq+0xf8/0x4be
  irq_exit+0xf1/0x100
  smp_apic_timer_interrupt+0xd0/0x2a0
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:cpuidle_enter_state+0xcd/0x440
 Code: 80 7c 24 13 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 0c 03 00 00 31 ff e8 6f 35 8b ff e8 1a 52 92 ff fb 66 0f 1f 44 00 00 <85> ed 0f 88 74 02 00 00 48 63 c5 4c 8b 3c 24 4c 2b 7c 24 08 48 8d
 RSP: 0018:ffff9a581981fe70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: 0000000000e2cf41 RBX: ffff9a581ab37400 RCX: 0000000000000000
 RDX: ffff9a581982d100 RSI: 0000000000000006 RDI: ffff9a581982d100
 RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb96f14c0
 R13: ffffffffb96f1678 R14: 0000000000000004 R15: 0000000000000004
  cpuidle_enter+0x29/0x40
  do_idle+0x24a/0x290
  cpu_startup_entry+0x19/0x20
  start_secondary+0x195/0x1e0
  secondary_startup_64+0xa4/0xb0


-- Steve

  parent reply	other threads:[~2020-05-04 17:40 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-29  9:48 [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Steven Rostedt
2020-04-29 10:59 ` Joerg Roedel
2020-04-29 12:28   ` Steven Rostedt
2020-04-29 14:07     ` Steven Rostedt
2020-04-29 14:10       ` Joerg Roedel
2020-04-29 14:32         ` Steven Rostedt
2020-04-29 15:44           ` Peter Zijlstra
2020-04-29 16:17       ` Joerg Roedel
2020-04-29 16:20         ` Joerg Roedel
2020-04-29 16:52           ` Steven Rostedt
2020-04-29 17:29             ` Mathieu Desnoyers
2020-04-29 18:51               ` Peter Zijlstra
2020-04-30 14:11       ` Joerg Roedel
2020-04-30 14:50         ` Joerg Roedel
2020-04-30 15:20           ` Mathieu Desnoyers
2020-04-30 16:16             ` Steven Rostedt
2020-04-30 16:18               ` Mathieu Desnoyers
2020-04-30 16:30                 ` Steven Rostedt
2020-04-30 16:35                   ` Mathieu Desnoyers
2020-04-30 15:23         ` Mathieu Desnoyers
2020-04-30 16:12           ` Steven Rostedt
2020-04-30 16:11         ` Steven Rostedt
2020-04-30 16:16           ` Mathieu Desnoyers
2020-04-30 16:25             ` Steven Rostedt
2020-04-30 19:14           ` Joerg Roedel
2020-05-01  1:13             ` Steven Rostedt
2020-05-01  2:26               ` Mathieu Desnoyers
2020-05-01  2:39                 ` Steven Rostedt
2020-05-01 10:16                   ` Joerg Roedel
2020-05-01 13:35                   ` Mathieu Desnoyers
2020-05-04 15:12                   ` [PATCH] percpu: Sync vmalloc mappings in pcpu_alloc() and free_percpu() Joerg Roedel
2020-05-04 15:28                     ` Mathieu Desnoyers
2020-05-04 15:31                       ` Joerg Roedel
2020-05-04 15:38                         ` Mathieu Desnoyers
2020-05-04 15:51                           ` Joerg Roedel
2020-05-04 17:04                           ` Steven Rostedt
2020-05-04 17:40                     ` Steven Rostedt [this message]
2020-05-04 18:38                       ` Joerg Roedel
2020-05-04 19:10                         ` Steven Rostedt
2020-05-05 12:31                           ` [PATCH] tracing: Call vmalloc_sync_mappings() after alloc_percpu() Joerg Roedel
2020-05-06 15:17                             ` Steven Rostedt
2020-05-08 14:42                               ` Joerg Roedel
2020-05-04 20:25                     ` [PATCH] percpu: Sync vmalloc mappings in pcpu_alloc() and free_percpu() Peter Zijlstra
2020-05-04 20:43                       ` Steven Rostedt
2020-05-01  4:20                 ` [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Steven Rostedt
2020-05-01 13:22                   ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200504134042.178409c3@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=jroedel@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=shile.zhang@linux.alibaba.com \
    --cc=tglx@linutronix.de \
    --cc=tz.stoyanov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.