All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taehee Yoo <ap420073@gmail.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>,
	David Miller <davem@davemloft.net>,
	Netdev <netdev@vger.kernel.org>, Jiri Pirko <jiri@resnulli.us>
Subject: Re: [PATCH net 1/5] netdevsim: fix a race condition in netdevsim operations
Date: Fri, 24 Jan 2020 14:16:03 +0900	[thread overview]
Message-ID: <CAMArcTUOC4YTXrCDoMpbMDwiQmCVm_uXJzgbDt8UFYm0D=DfOw@mail.gmail.com> (raw)
In-Reply-To: <20200115061634.35da2950@cakuba.hsd1.ca.comcast.net>

On Wed, 15 Jan 2020 at 23:16, Jakub Kicinski <kuba@kernel.org> wrote:
>

Hi Jakub!

> On Wed, 15 Jan 2020 00:26:22 +0900, Taehee Yoo wrote:
> > On Sun, 12 Jan 2020 at 23:19, Jakub Kicinski wrote:
> > Hi Jakub,
> > Thank you for your review!
>
> Thank you for fixing these tricky bugs! :)
>
> > > Perhaps the entire bus dev structure should be freed from release?
> >
> > I tested this like this.
> >
> > @@ -146,6 +161,8 @@ static void nsim_bus_dev_release(struct device *dev)
> >         struct nsim_bus_dev *nsim_bus_dev = to_nsim_bus_dev(dev);
> >
> >         nsim_bus_dev_vfs_disable(nsim_bus_dev);
> > +       ida_free(&nsim_bus_dev_ids, nsim_bus_dev->dev.id);
> > +       kfree(nsim_bus_dev);
> >  }
> > @@ -300,8 +320,6 @@ nsim_bus_dev_new(unsigned int id, unsigned int port_count)
> >  static void nsim_bus_dev_del(struct nsim_bus_dev *nsim_bus_dev)
> >  {
> >         device_unregister(&nsim_bus_dev->dev);
> > -       ida_free(&nsim_bus_dev_ids, nsim_bus_dev->dev.id);
> > -       kfree(nsim_bus_dev);
> >  }
> >
> > It works well. but I'm not sure this is needed.
>
> My concern is that process A opens a sysfs file (eg. numvfs) then
> process B deletes the device, but process A still has a file descriptor
> (and device reference) so it may be able to write/read the numvfs file
> even though nsim_bus_dev was already freed.
>

If I understood kernfs correctly, kernfs internally avoid this situation.
a) When kernfs file is being removed, it waits for all users who are
operating this file(open/read/write, etc...)
b) When kernfs file is removed, the file is disallowed to be used.
c) Opened kernfs file descriptor also will be disallowed to use anymore.
d) File remove routine is finished, resources are freed.
So, user-after-free case couldn't occur.
The below piece of code is kernfs synchronize code.

static void kernfs_drain(struct kernfs_node *kn)
{
 ...
        wait_event(root->deactivate_waitq,
                   atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
 ...
}

static ssize_t kernfs_fop_write(struct file *file, const char __user *user_buf,
                                size_t count, loff_t *ppos)
{
 ...
        if (!kernfs_get_active(of->kn)) {
                mutex_unlock(&of->mutex);
                len = -ENODEV;
                goto out_free;
        }

        ops = kernfs_ops(of->kn);
        if (ops->write)
                len = ops->write(of, buf, len, *ppos);
        else
                len = -EINVAL;

        kernfs_put_active(of->kn);
 ...
}

void kernfs_put_active(struct kernfs_node *kn)
{
 ...
        v = atomic_dec_return(&kn->active);
        if (likely(v != KN_DEACTIVATED_BIAS))
                return;

        wake_up_all(&kernfs_root(kn)->deactivate_waitq);
}

I have tested this code, it works well.

> I may very well be wrong, and something else may be preventing this
> condition. It's just a bit strange to see release free an internal
> sub-structure, while the main structure is freed immediately..
>

I didn't think about ordering of resource release routine.
So I took a look at the release routine.

del_device_store()
    nsim_bus_dev_del()
        nsim_bus_dev_del()
            kobject_put()
                device_release()
                    nsim_bus_dev_release()
                        kfree(nsim_bus_dev->vconfigs)
    kfree(nsim_bus_dev)

Before freeing nsim_bus_dev, all resources are freed in the
device_unregister(). So, I think it's safe.

> > > >       unsigned int num_vfs;
> > > >       int ret;
> > > >
> > > > +     if (!mutex_trylock(&nsim_bus_dev_ops_lock))
> > > > +             return -EBUSY;
> > >
> > > Why the trylocks? Are you trying to catch the races between unregister
> > > and other ops?
> > >
> >
> > The reason is to avoid deadlock.
> > If we use mutex_lock() instead of mutex_trylock(), the below message
> > will be printed and actual deadlock also appeared.
>
> > [  426.907883][  T805]  Possible unsafe locking scenario:
> > [  426.907883][  T805]
> > [  426.908715][  T805]        CPU0                    CPU1
> > [  426.909312][  T805]        ----                    ----
> > [  426.909902][  T805]   lock(kn->count#170);
> > [  426.910372][  T805]
> > lock(nsim_bus_dev_ops_lock);
> > [  426.911277][  T805]                                lock(kn->count#170);
> > [  426.912032][  T805]   lock(nsim_bus_dev_ops_lock);
>
> > Locking ordering of {new/del}_device() and {new/del}_port is different.
> > So, a deadlock could occur.
>
> Hm, we can't use the same lock for the bus ops and port ops.
> But the port ops already take port lock, do we really need
> another lock there?
>

A synchronize routine is needed.
new_port() and del_port() operations access many device resources.
It could be used even before resources are allocated or initialized.
So, new_port() and del_port() should be allowed to use after resources
are initialized. But sriov_numvfs() doesn't use uninitialized resource
so it doesn't make any problem.

If a simple flag variable is used, we can avoid using a trylock.
The flag is set after resources are initialized.
So if new_port() and del_port() check the flag, it doesn't access
uninitialized resources.

I would like to try to avoid using trylock.

> Also does nsim_bus_exit() really need to iterate over devices to remove
> them? Does core not do it for us?

I couldn't find the logic, which remove devices.
So I think it's needed.

Thank you!
Taehee Yoo

  reply	other threads:[~2020-01-24  5:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-11 16:36 [PATCH net 1/5] netdevsim: fix a race condition in netdevsim operations Taehee Yoo
2020-01-12 14:19 ` Jakub Kicinski
2020-01-14 15:26   ` Taehee Yoo
2020-01-15 14:16     ` Jakub Kicinski
2020-01-24  5:16       ` Taehee Yoo [this message]
2020-01-24 13:53         ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMArcTUOC4YTXrCDoMpbMDwiQmCVm_uXJzgbDt8UFYm0D=DfOw@mail.gmail.com' \
    --to=ap420073@gmail.com \
    --cc=davem@davemloft.net \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.