All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kimberly Brown <kimbrownkd@gmail.com>
To: Dexuan Cui <decui@microsoft.com>
Cc: Michael Kelley <mikelley@microsoft.com>,
	Sasha Levin <sashal@kernel.org>, Long Li <longli@microsoft.com>,
	Sasha Levin <Alexander.Levin@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	KY Srinivasan <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"devel@linuxdriverproject.org" <devel@linuxdriverproject.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions
Date: Sat, 2 Feb 2019 15:07:35 -0500	[thread overview]
Message-ID: <20190202200735.GA2637@ubu-Virtual-Machine> (raw)
In-Reply-To: <PU1P153MB0169227534F832C2DAEC02B4BF920@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Fri, Feb 01, 2019 at 06:24:24PM +0000, Dexuan Cui wrote:
> > From: Kimberly Brown <kimbrownkd@gmail.com>
> > Sent: Thursday, January 31, 2019 9:47 AM
> > ...
> > 2) Prevent a deadlock that can occur between the proposed mutex_lock()
> > call in the vmbus_chan_attr_show() function and the sysfs/kernfs functions.
> Hi Kim,
> Can you please share more details about the deadlock? 
> It's unclear to me how the deadlock happens.
>

Hi Dexuan,

I encountered the deadlock by:
1) Adding a call to msleep() before acquiring the mutex in
vmbus_chan_attr_show()
2) Opening a hv_netvsc subchannel's sysfs file
3) Removing hv_netvsc while the sysfs file is opening

Here's the lockdep output:

[  164.167699] hv_vmbus: unregistering driver hv_netvsc

[  164.171660] ======================================================
[  164.171661] WARNING: possible circular locking dependency detected
[  164.171663] 5.0.0-rc1+ #58 Not tainted
[  164.171664] ------------------------------------------------------
[  164.171666] kworker/0:1/12 is trying to acquire lock:
[  164.171668] 00000000664f9893 (kn->count#43){++++}, at: kernfs_remove+0x23/0x40
[  164.171676]
               but task is already holding lock:
[  164.171677] 000000007b9e8443 (&vmbus_connection.channel_mutex){+.+.}, at: vmbus_onoffer_rescind+0x1ae/0x210 [hv_vmbus]
[  164.171686]
               which lock already depends on the new lock.

[  164.171687]
               the existing dependency chain (in reverse order) is:
[  164.171688]
               -> #1 (&vmbus_connection.channel_mutex){+.+.}:
[  164.171694]        __mutex_lock+0x65/0x9b0
[  164.171696]        mutex_lock_nested+0x1b/0x20
[  164.171700]        vmbus_chan_attr_show+0x3f/0x90 [hv_vmbus]
[  164.171703]        sysfs_kf_seq_show+0xa4/0x130
[  164.171705]        kernfs_seq_show+0x2d/0x30
[  164.171708]        seq_read+0xe2/0x410
[  164.171711]        kernfs_fop_read+0x14e/0x1a0
[  164.171714]        __vfs_read+0x3a/0x1a0
[  164.171716]        vfs_read+0x91/0x140
[  164.171718]        ksys_read+0x58/0xc0
[  164.171721]        __x64_sys_read+0x1a/0x20
[  164.171724]        do_syscall_64+0x65/0x1b0
[  164.171727]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  164.171728]
               -> #0 (kn->count#43){++++}:
[  164.171732]        lock_acquire+0xa3/0x180
[  164.171734]        __kernfs_remove+0x278/0x300
[  164.171737]        kernfs_remove+0x23/0x40
[  164.171739]        sysfs_remove_dir+0x51/0x60
[  164.171741]        kobject_del.part.7+0x13/0x40
[  164.171743]        kobject_put+0x6a/0x1a0
[  164.171748]        hv_process_channel_removal+0xfe/0x180 [hv_vmbus]
[  164.171752]        vmbus_onoffer_rescind+0x20a/0x210 [hv_vmbus]
[  164.171756]        vmbus_onmessage+0x5f/0x150 [hv_vmbus]
[  164.171760]        vmbus_onmessage_work+0x22/0x30 [hv_vmbus]
[  164.171763]        process_one_work+0x291/0x5c0
[  164.171765]        worker_thread+0x34/0x400
[  164.171767]        kthread+0x124/0x140
[  164.171770]        ret_from_fork+0x3a/0x50
[  164.171771]
               other info that might help us debug this:

[  164.171772]  Possible unsafe locking scenario:

[  164.171773]        CPU0                    CPU1
[  164.171775]        ----                    ----
[  164.171776]   lock(&vmbus_connection.channel_mutex);
[  164.171777]                                lock(kn->count#43);
[  164.171779]                                lock(&vmbus_connection.channel_mutex);
[  164.171781]   lock(kn->count#43);
[  164.171783]
                *** DEADLOCK ***

[  164.171785] 3 locks held by kworker/0:1/12:
[  164.171786]  #0: 000000002128b29f ((wq_completion)"events"){+.+.}, at: process_one_work+0x20f/0x5c0
[  164.171790]  #1: 0000000041d2602c ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x20f/0x5c0
[  164.171794]  #2: 000000007b9e8443 (&vmbus_connection.channel_mutex){+.+.}, at: vmbus_onoffer_rescind+0x1ae/0x210 [hv_vmbus]
[  164.171799]
               stack backtrace:
[  164.171802] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.0.0-rc1+ #58
[  164.171804] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
[  164.171809] Workqueue: events vmbus_onmessage_work [hv_vmbus]
[  164.171811] Call Trace:
[  164.171816]  dump_stack+0x8e/0xd5
[  164.171819]  print_circular_bug.isra.37+0x1e7/0x1f5
[  164.171822]  __lock_acquire+0x1427/0x1490
[  164.171826]  ? sched_clock+0x9/0x10
[  164.171830]  lock_acquire+0xa3/0x180
[  164.171832]  ? lock_acquire+0xa3/0x180
[  164.171835]  ? kernfs_remove+0x23/0x40
[  164.171838]  __kernfs_remove+0x278/0x300
[  164.171840]  ? kernfs_remove+0x23/0x40
[  164.171843]  kernfs_remove+0x23/0x40
[  164.171846]  sysfs_remove_dir+0x51/0x60
[  164.171848]  kobject_del.part.7+0x13/0x40
[  164.171850]  kobject_put+0x6a/0x1a0
[  164.171855]  hv_process_channel_removal+0xfe/0x180 [hv_vmbus]
[  164.171859]  vmbus_onoffer_rescind+0x20a/0x210 [hv_vmbus]
[  164.171863]  vmbus_onmessage+0x5f/0x150 [hv_vmbus]
[  164.171868]  vmbus_onmessage_work+0x22/0x30 [hv_vmbus]
[  164.171870]  process_one_work+0x291/0x5c0
[  164.171874]  worker_thread+0x34/0x400
[  164.171877]  kthread+0x124/0x140
[  164.171879]  ? process_one_work+0x5c0/0x5c0
[  164.171881]  ? kthread_park+0x90/0x90
[  164.171884]  ret_from_fork+0x3a/0x50

==========================================================================


> > I've identified two possible solutions to the deadlock:
> > 
> > 1) Add a new mutex to the vmbus_channel struct which protects changes to
> > "channel->state". Use this new mutex in vmbus_chan_attr_show() instead of
> > "vmbus_connection.channel_mutex".
> > 
> > 2) Use "vmbus_connection.channel_mutex" in vmbus_chan_attr_show() as
> > originally proposed, and acquire it with mutex_trylock(). If the mutex
> > cannot be acquired, return -EINVAL.
> It looks more like a workaround. IMO we should try to find a real fix. :-)
>  

Good point. I think that the root cause of the deadlock problem is that
the calls to free_channel() in hv_process_channel_removal() are within a
section that's locked with vmbus_connection.channel_mutex. So, first, I
need to determine whether free_channel() needs to be called within the
locked section.

If free_channel() does need to be called within the locked section, then
there's no way to prevent the possibility of a deadlock when
vmbus_connection.channel_mutex is used in vmbus_chan_attr_show(). A
different solution, such as a new mutex to protect changes to
"channel->state", is necessary.

If free_channel() does not need to be in the locked section, then we may
be able to restructure the functions to prevent the deadlock.

I see that there was a race condition problem when
hv_process_channel_removal() was called on an already freed channel
(fixed in commit 192b2d78722f, "Fix bugs in rescind handling"), so I
suspect that free_channel() needs to stay in the locked section.


> > I'm leaning towards (2), using mutex_trylock().
> > "vmbus_connection.channel_mutex"
> > appears to be used only when a channel is being opened or closed, so
> > vmbus_chan_attr_show() should be able to acquire the mutex when the
> > channel is in use.
> > 
> > If anyone has suggestions, please let me know.
> > 
> > Thanks,
> > Kim
> 
> Thanks,
> -- Dexuan

  reply	other threads:[~2019-02-02 20:07 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-22  2:07 [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions Kimberly Brown
2019-01-22  3:46 ` Dexuan Cui
2019-01-22  6:42   ` Kimberly Brown
2019-01-22 18:40     ` Dexuan Cui
2019-01-28 19:58       ` Kimberly Brown
2019-01-29 19:20         ` Dexuan Cui
2019-01-31 15:19           ` Sasha Levin
2019-01-31 16:45             ` Michael Kelley
2019-01-31 17:47               ` Kimberly Brown
2019-02-01 14:13                 ` Sasha Levin
2019-02-01 18:24                 ` Dexuan Cui
2019-02-02 20:07                   ` Kimberly Brown [this message]
2019-02-15  1:54                     ` Sasha Levin
2019-02-15  2:27                       ` Kimberly Brown
2019-02-22  3:46 ` [PATCH v2 0/2] Fix a race condition vulnerability in "_show" functions Kimberly Brown
2019-02-22  3:47   ` [PATCH v2 1/2] Drivers: hv: vmbus: Refactor chan->state if statement Kimberly Brown
2019-02-24 16:54     ` Michael Kelley
2019-02-22  3:47   ` [PATCH v2 2/2] Drivers: hv: vmbus: Add a channel ring buffer mutex lock Kimberly Brown
2019-02-24 16:53     ` Michael Kelley
2019-02-26  6:24       ` Kimberly Brown
2019-03-14 20:04   ` [PATCH v3 0/3] Drivers: hv: vmbus: Fix a race condition in "_show" functions Kimberly Brown
2019-03-14 20:05     ` [PATCH v3 1/3] Drivers: hv: vmbus: Refactor chan->state if statement Kimberly Brown
2019-03-14 20:05     ` [PATCH v3 2/3] Drivers: hv: vmbus: Set ring_info field to 0 and remove memset Kimberly Brown
2019-03-29 16:01       ` Michael Kelley
2019-03-14 20:05     ` [PATCH v3 3/3] Drivers: hv: vmbus: Fix race condition with new ring_buffer_info mutex Kimberly Brown
2019-03-14 22:45       ` Stephen Hemminger
2019-03-17  1:49         ` Kimberly Brown
2019-03-20 20:06           ` Stephen Hemminger
2019-03-21  3:47             ` Kimberly Brown
2019-03-21 16:04               ` Michael Kelley
2019-03-28  4:30                 ` Kimberly Brown
2019-03-28 18:42                   ` Stephen Hemminger
2019-03-29 16:04       ` Michael Kelley
2019-04-10 22:59     ` [PATCH v3 0/3] Drivers: hv: vmbus: Fix a race condition in "_show" functions Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190202200735.GA2637@ubu-Virtual-Machine \
    --to=kimbrownkd@gmail.com \
    --cc=Alexander.Levin@microsoft.com \
    --cc=decui@microsoft.com \
    --cc=devel@linuxdriverproject.org \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=mikelley@microsoft.com \
    --cc=sashal@kernel.org \
    --cc=sthemmin@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.