* [PATCH] fs: fix use-after-free in __fput() when a chardev is removed but a file is still open @ 2019-11-25 12:53 Vladis Dronov 2019-12-08 19:49 ` Al Viro 0 siblings, 1 reply; 7+ messages in thread From: Vladis Dronov @ 2019-11-25 12:53 UTC (permalink / raw) To: Alexander Viro, Richard Cochran, linux-fsdevel Cc: netdev, linux-kernel, vdronov In a case when a chardev file (like /dev/ptp0) is open but an underlying device is removed, closing this file leads to a use-after-free. This reproduces easily in a KVM virtual machine: # cat openptp0.c int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); } # uname -r 5.4.0-219d5433 # cat /proc/cmdline ... slub_debug=FZP # modprobe ptp_kvm # ./openptp0 & [1] 670 opened /dev/ptp0, sleeping 10s... # rmmod ptp_kvm # ls /dev/ptp* ls: cannot access '/dev/ptp*': No such file or directory # ...woken up [ 102.375849] general protection fault: 0000 [#1] SMP [ 102.377372] CPU: 1 PID: 670 Comm: openptp0 Not tainted 5.4.0-219d5433 #1 [ 102.379163] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... [ 102.381129] RIP: 0010:module_put.part.0+0x7/0x80 [ 102.383019] RSP: 0018:ffff9ba440687e00 EFLAGS: 00010202 [ 102.383451] RAX: 0000000000002000 RBX: 6b6b6b6b6b6b6b6b RCX: ffff91e736800ad0 [ 102.384030] RDX: ffffcf6408bc2808 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b [ 102.386032] ... ^^^ a slub poison [ 102.389866] Call Trace: [ 102.390086] __fput+0x21f/0x240 [ 102.390363] task_work_run+0x79/0x90 [ 102.390671] do_exit+0x2c9/0xad0 [ 102.390931] ? vfs_write+0x16a/0x190 [ 102.391241] do_group_exit+0x35/0x90 [ 102.391549] __x64_sys_exit_group+0xf/0x10 [ 102.391898] do_syscall_64+0x3d/0x110 [ 102.392240] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 102.392695] RIP: 0033:0x7f0fa7016246 [ 102.396615] ... [ 102.397225] Modules linked in: [last unloaded: ptp_kvm] [ 102.410323] Fixing recursive fault but reboot is needed! This happens in: static void __fput(struct file *file) { ... if (file->f_op->release) file->f_op->release(inode, file); <<< cdev is kfree'd here if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL && !(mode & FMODE_PATH))) { cdev_put(inode->i_cdev); <<< cdev fields are accessed here because of: __fput() posix_clock_release() kref_put(&clk->kref, delete_clock) <<< the last reference delete_clock() delete_ptp_clock() kfree(ptp) <<< cdev is embedded in ptp cdev_put module_put(p->owner) <<< *p is kfree'd The fix is to call cdev_put() before file->f_op->release(). This fix the class of bugs when a chardev device is removed when its file is open, for example: # lspci 00:09.0 System peripheral: Intel Corporation 6300ESB Watchdog Timer # ./openwdog0 & [1] 672 opened /dev/watchdog0, sleeping 10s... # echo 1 > /sys/devices/pci0000:00/0000:00:09.0/remove # ls /dev/watch* ls: cannot access '/dev/watch*': No such file or directory # ...woken up [ 63.500271] general protection fault: 0000 [#1] SMP [ 63.501757] CPU: 1 PID: 672 Comm: openwdog0 Not tainted 5.4.0-219d5433 #4 [ 63.503605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... [ 63.507064] RIP: 0010:module_put.part.0+0x7/0x80 [ 63.513841] RSP: 0018:ffffb96b00667e00 EFLAGS: 00010202 [ 63.515376] RAX: 0000000000002000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000150013 [ 63.517478] RDX: 0000000000000246 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b Analyzed-by: Stephen Johnston <sjohnsto@redhat.com> Analyzed-by: Vern Lovejoy <vlovejoy@redhat.com> Signed-off-by: Vladis Dronov <vdronov@redhat.com> --- fs/file_table.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index 30d55c9a1744..21ba35024950 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -276,12 +276,12 @@ static void __fput(struct file *file) if (file->f_op->fasync) file->f_op->fasync(-1, file, 0); } - if (file->f_op->release) - file->f_op->release(inode, file); if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL && !(mode & FMODE_PATH))) { cdev_put(inode->i_cdev); } + if (file->f_op->release) + file->f_op->release(inode, file); fops_put(file->f_op); put_pid(file->f_owner.pid); if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) -- 2.20.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] fs: fix use-after-free in __fput() when a chardev is removed but a file is still open 2019-11-25 12:53 [PATCH] fs: fix use-after-free in __fput() when a chardev is removed but a file is still open Vladis Dronov @ 2019-12-08 19:49 ` Al Viro 2019-12-08 19:53 ` Al Viro 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2019-12-08 19:49 UTC (permalink / raw) To: Vladis Dronov; +Cc: Richard Cochran, linux-fsdevel, netdev, linux-kernel On Mon, Nov 25, 2019 at 01:53:42PM +0100, Vladis Dronov wrote: > In a case when a chardev file (like /dev/ptp0) is open but an underlying > device is removed, closing this file leads to a use-after-free. This > reproduces easily in a KVM virtual machine: > > # cat openptp0.c > int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); } > static void __fput(struct file *file) > { ... > if (file->f_op->release) > file->f_op->release(inode, file); <<< cdev is kfree'd here > if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL && > !(mode & FMODE_PATH))) { > cdev_put(inode->i_cdev); <<< cdev fields are accessed here > > because of: > > __fput() > posix_clock_release() > kref_put(&clk->kref, delete_clock) <<< the last reference > delete_clock() > delete_ptp_clock() > kfree(ptp) <<< cdev is embedded in ptp > cdev_put > module_put(p->owner) <<< *p is kfree'd > > The fix is to call cdev_put() before file->f_op->release(). This fix the > class of bugs when a chardev device is removed when its file is open, for > example: And what's to prevent rmmod coming and freeing ->release code right as you are executing it? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fs: fix use-after-free in __fput() when a chardev is removed but a file is still open 2019-12-08 19:49 ` Al Viro @ 2019-12-08 19:53 ` Al Viro 2019-12-27 2:26 ` [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev Vladis Dronov 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2019-12-08 19:53 UTC (permalink / raw) To: Vladis Dronov; +Cc: Richard Cochran, linux-fsdevel, netdev, linux-kernel On Sun, Dec 08, 2019 at 07:49:07PM +0000, Al Viro wrote: > On Mon, Nov 25, 2019 at 01:53:42PM +0100, Vladis Dronov wrote: > > In a case when a chardev file (like /dev/ptp0) is open but an underlying > > device is removed, closing this file leads to a use-after-free. This > > reproduces easily in a KVM virtual machine: > > > > # cat openptp0.c > > int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); } > > > static void __fput(struct file *file) > > { ... > > if (file->f_op->release) > > file->f_op->release(inode, file); <<< cdev is kfree'd here > > > if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL && > > !(mode & FMODE_PATH))) { > > cdev_put(inode->i_cdev); <<< cdev fields are accessed here > > > > because of: > > > > __fput() > > posix_clock_release() > > kref_put(&clk->kref, delete_clock) <<< the last reference > > delete_clock() > > delete_ptp_clock() > > kfree(ptp) <<< cdev is embedded in ptp > > cdev_put > > module_put(p->owner) <<< *p is kfree'd > > > > The fix is to call cdev_put() before file->f_op->release(). This fix the > > class of bugs when a chardev device is removed when its file is open, for > > example: > > And what's to prevent rmmod coming and freeing ->release code right as you > are executing it? FWIW, the bug here seems to be that the lifetime rules of cdev are fucked - if it can get freed while its ->kobj is still alive, we have something very wrong there. IOW, you have ptp lifetime controlled by *TWO* refcounts - that of clk and that of of cdev->kobj. That's doesn't work. Replace that kfree() with dropping a kobject reference, perhaps, so that freeing would've been done by its release callback? ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev 2019-12-08 19:53 ` Al Viro @ 2019-12-27 2:26 ` Vladis Dronov 2019-12-27 15:02 ` Richard Cochran 2019-12-31 4:19 ` David Miller 0 siblings, 2 replies; 7+ messages in thread From: Vladis Dronov @ 2019-12-27 2:26 UTC (permalink / raw) To: linux-fsdevel, Alexander Viro, Richard Cochran Cc: vdronov, Al Viro, netdev, linux-kernel In a case when a ptp chardev (like /dev/ptp0) is open but an underlying device is removed, closing this file leads to a race. This reproduces easily in a kvm virtual machine: ts# cat openptp0.c int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); } ts# uname -r 5.5.0-rc3-46cf053e ts# cat /proc/cmdline ... slub_debug=FZP ts# modprobe ptp_kvm ts# ./openptp0 & [1] 670 opened /dev/ptp0, sleeping 10s... ts# rmmod ptp_kvm ts# ls /dev/ptp* ls: cannot access '/dev/ptp*': No such file or directory ts# ...woken up [ 48.010809] general protection fault: 0000 [#1] SMP [ 48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25 [ 48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... [ 48.016270] RIP: 0010:module_put.part.0+0x7/0x80 [ 48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202 [ 48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0 [ 48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b [ 48.019470] ... ^^^ a slub poison [ 48.023854] Call Trace: [ 48.024050] __fput+0x21f/0x240 [ 48.024288] task_work_run+0x79/0x90 [ 48.024555] do_exit+0x2af/0xab0 [ 48.024799] ? vfs_write+0x16a/0x190 [ 48.025082] do_group_exit+0x35/0x90 [ 48.025387] __x64_sys_exit_group+0xf/0x10 [ 48.025737] do_syscall_64+0x3d/0x130 [ 48.026056] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 48.026479] RIP: 0033:0x7f53b12082f6 [ 48.026792] ... [ 48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm] [ 48.045001] Fixing recursive fault but reboot is needed! This happens in: static void __fput(struct file *file) { ... if (file->f_op->release) file->f_op->release(inode, file); <<< cdev is kfree'd here if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL && !(mode & FMODE_PATH))) { cdev_put(inode->i_cdev); <<< cdev fields are accessed here Namely: __fput() posix_clock_release() kref_put(&clk->kref, delete_clock) <<< the last reference delete_clock() delete_ptp_clock() kfree(ptp) <<< cdev is embedded in ptp cdev_put module_put(p->owner) <<< *p is kfree'd, bang! Here cdev is embedded in posix_clock which is embedded in ptp_clock. The race happens because ptp_clock's lifetime is controlled by two refcounts: kref and cdev.kobj in posix_clock. This is wrong. Make ptp_clock's sysfs device a parent of cdev with cdev_device_add() created especially for such cases. This way the parent device with its ptp_clock is not released until all references to the cdev are released. This adds a requirement that an initialized but not exposed struct device should be provided to posix_clock_register() by a caller instead of a simple dev_t. This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix the race between the release of watchdog_core_data and cdev"). See details of the implementation in the commit 233ed09d7fda ("chardev: add helper function to register char devs with a struct device"). Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u Analyzed-by: Stephen Johnston <sjohnsto@redhat.com> Analyzed-by: Vern Lovejoy <vlovejoy@redhat.com> Signed-off-by: Vladis Dronov <vdronov@redhat.com> --- drivers/ptp/ptp_clock.c | 31 ++++++++++++++----------------- drivers/ptp/ptp_private.h | 2 +- include/linux/posix-clock.h | 19 +++++++++++-------- kernel/time/posix-clock.c | 31 +++++++++++++------------------ 4 files changed, 39 insertions(+), 44 deletions(-) diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c index e60eab7f8a61..61fafe0374ce 100644 --- a/drivers/ptp/ptp_clock.c +++ b/drivers/ptp/ptp_clock.c @@ -166,9 +166,9 @@ static struct posix_clock_operations ptp_clock_ops = { .read = ptp_read, }; -static void delete_ptp_clock(struct posix_clock *pc) +static void ptp_clock_release(struct device *dev) { - struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); + struct ptp_clock *ptp = container_of(dev, struct ptp_clock, dev); mutex_destroy(&ptp->tsevq_mux); mutex_destroy(&ptp->pincfg_mux); @@ -213,7 +213,6 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info, } ptp->clock.ops = ptp_clock_ops; - ptp->clock.release = delete_ptp_clock; ptp->info = info; ptp->devid = MKDEV(major, index); ptp->index = index; @@ -236,15 +235,6 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info, if (err) goto no_pin_groups; - /* Create a new device in our class. */ - ptp->dev = device_create_with_groups(ptp_class, parent, ptp->devid, - ptp, ptp->pin_attr_groups, - "ptp%d", ptp->index); - if (IS_ERR(ptp->dev)) { - err = PTR_ERR(ptp->dev); - goto no_device; - } - /* Register a new PPS source. */ if (info->pps) { struct pps_source_info pps; @@ -260,8 +250,18 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info, } } - /* Create a posix clock. */ - err = posix_clock_register(&ptp->clock, ptp->devid); + /* Initialize a new device of our class in our clock structure. */ + device_initialize(&ptp->dev); + ptp->dev.devt = ptp->devid; + ptp->dev.class = ptp_class; + ptp->dev.parent = parent; + ptp->dev.groups = ptp->pin_attr_groups; + ptp->dev.release = ptp_clock_release; + dev_set_drvdata(&ptp->dev, ptp); + dev_set_name(&ptp->dev, "ptp%d", ptp->index); + + /* Create a posix clock and link it to the device. */ + err = posix_clock_register(&ptp->clock, &ptp->dev); if (err) { pr_err("failed to create posix clock\n"); goto no_clock; @@ -273,8 +273,6 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info, if (ptp->pps_source) pps_unregister_source(ptp->pps_source); no_pps: - device_destroy(ptp_class, ptp->devid); -no_device: ptp_cleanup_pin_groups(ptp); no_pin_groups: if (ptp->kworker) @@ -304,7 +302,6 @@ int ptp_clock_unregister(struct ptp_clock *ptp) if (ptp->pps_source) pps_unregister_source(ptp->pps_source); - device_destroy(ptp_class, ptp->devid); ptp_cleanup_pin_groups(ptp); posix_clock_unregister(&ptp->clock); diff --git a/drivers/ptp/ptp_private.h b/drivers/ptp/ptp_private.h index 9171d42468fd..6b97155148f1 100644 --- a/drivers/ptp/ptp_private.h +++ b/drivers/ptp/ptp_private.h @@ -28,7 +28,7 @@ struct timestamp_event_queue { struct ptp_clock { struct posix_clock clock; - struct device *dev; + struct device dev; struct ptp_clock_info *info; dev_t devid; int index; /* index into clocks.map */ diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h index fe6cfdcfbc26..5cfe13293243 100644 --- a/include/linux/posix-clock.h +++ b/include/linux/posix-clock.h @@ -69,29 +69,32 @@ struct posix_clock_operations { * * @ops: Functional interface to the clock * @cdev: Character device instance for this clock - * @kref: Reference count. + * @dev: Pointer to the clock's device. * @rwsem: Protects the 'zombie' field from concurrent access. * @zombie: If 'zombie' is true, then the hardware has disappeared. - * @release: A function to free the structure when the reference count reaches - * zero. May be NULL if structure is statically allocated. * * Drivers should embed their struct posix_clock within a private * structure, obtaining a reference to it during callbacks using * container_of(). + * + * Drivers should supply an initialized but not exposed struct device + * to posix_clock_register(). It is used to manage lifetime of the + * driver's private structure. It's 'release' field should be set to + * a release function for this private structure. */ struct posix_clock { struct posix_clock_operations ops; struct cdev cdev; - struct kref kref; + struct device *dev; struct rw_semaphore rwsem; bool zombie; - void (*release)(struct posix_clock *clk); }; /** * posix_clock_register() - register a new clock - * @clk: Pointer to the clock. Caller must provide 'ops' and 'release' - * @devid: Allocated device id + * @clk: Pointer to the clock. Caller must provide 'ops' field + * @dev: Pointer to the initialized device. Caller must provide + * 'release' filed * * A clock driver calls this function to register itself with the * clock device subsystem. If 'clk' points to dynamically allocated @@ -100,7 +103,7 @@ struct posix_clock { * * Returns zero on success, non-zero otherwise. */ -int posix_clock_register(struct posix_clock *clk, dev_t devid); +int posix_clock_register(struct posix_clock *clk, struct device *dev); /** * posix_clock_unregister() - unregister a clock diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c index ec960bb939fd..200fb2d3be99 100644 --- a/kernel/time/posix-clock.c +++ b/kernel/time/posix-clock.c @@ -14,8 +14,6 @@ #include "posix-timers.h" -static void delete_clock(struct kref *kref); - /* * Returns NULL if the posix_clock instance attached to 'fp' is old and stale. */ @@ -125,7 +123,7 @@ static int posix_clock_open(struct inode *inode, struct file *fp) err = 0; if (!err) { - kref_get(&clk->kref); + get_device(clk->dev); fp->private_data = clk; } out: @@ -141,7 +139,7 @@ static int posix_clock_release(struct inode *inode, struct file *fp) if (clk->ops.release) err = clk->ops.release(clk); - kref_put(&clk->kref, delete_clock); + put_device(clk->dev); fp->private_data = NULL; @@ -161,38 +159,35 @@ static const struct file_operations posix_clock_file_operations = { #endif }; -int posix_clock_register(struct posix_clock *clk, dev_t devid) +int posix_clock_register(struct posix_clock *clk, struct device *dev) { int err; - kref_init(&clk->kref); init_rwsem(&clk->rwsem); cdev_init(&clk->cdev, &posix_clock_file_operations); + err = cdev_device_add(&clk->cdev, dev); + if (err) { + pr_err("%s unable to add device %d:%d\n", + dev_name(dev), MAJOR(dev->devt), MINOR(dev->devt)); + return err; + } clk->cdev.owner = clk->ops.owner; - err = cdev_add(&clk->cdev, devid, 1); + clk->dev = dev; - return err; + return 0; } EXPORT_SYMBOL_GPL(posix_clock_register); -static void delete_clock(struct kref *kref) -{ - struct posix_clock *clk = container_of(kref, struct posix_clock, kref); - - if (clk->release) - clk->release(clk); -} - void posix_clock_unregister(struct posix_clock *clk) { - cdev_del(&clk->cdev); + cdev_device_del(&clk->cdev, clk->dev); down_write(&clk->rwsem); clk->zombie = true; up_write(&clk->rwsem); - kref_put(&clk->kref, delete_clock); + put_device(clk->dev); } EXPORT_SYMBOL_GPL(posix_clock_unregister); -- 2.20.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev 2019-12-27 2:26 ` [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev Vladis Dronov @ 2019-12-27 15:02 ` Richard Cochran 2019-12-27 17:24 ` Vladis Dronov 2019-12-31 4:19 ` David Miller 1 sibling, 1 reply; 7+ messages in thread From: Richard Cochran @ 2019-12-27 15:02 UTC (permalink / raw) To: Vladis Dronov Cc: linux-fsdevel, Alexander Viro, Al Viro, netdev, linux-kernel On Fri, Dec 27, 2019 at 03:26:27AM +0100, Vladis Dronov wrote: > Here cdev is embedded in posix_clock which is embedded in ptp_clock. > The race happens because ptp_clock's lifetime is controlled by two > refcounts: kref and cdev.kobj in posix_clock. This is wrong. > > Make ptp_clock's sysfs device a parent of cdev with cdev_device_add() > created especially for such cases. This way the parent device with its > ptp_clock is not released until all references to the cdev are released. > This adds a requirement that an initialized but not exposed struct > device should be provided to posix_clock_register() by a caller instead > of a simple dev_t. > > This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix > the race between the release of watchdog_core_data and cdev"). See > details of the implementation in the commit 233ed09d7fda ("chardev: add > helper function to register char devs with a struct device"). Thanks for digging into this! Acked-by: Richard Cochran <richardcochran@gmail.com> > /** > * posix_clock_register() - register a new clock > - * @clk: Pointer to the clock. Caller must provide 'ops' and 'release' > - * @devid: Allocated device id > + * @clk: Pointer to the clock. Caller must provide 'ops' field > + * @dev: Pointer to the initialized device. Caller must provide > + * 'release' filed field Thanks, Richard ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev 2019-12-27 15:02 ` Richard Cochran @ 2019-12-27 17:24 ` Vladis Dronov 0 siblings, 0 replies; 7+ messages in thread From: Vladis Dronov @ 2019-12-27 17:24 UTC (permalink / raw) To: Richard Cochran Cc: linux-fsdevel, Alexander Viro, Al Viro, netdev, linux-kernel Hello, Richard, Thank you for the review! > > + * @dev: Pointer to the initialized device. Caller must provide > > + * 'release' filed > > field Indeed. *sigh* Nothing is ideal. Let's hope a maintainer could fix it if this is approved. Best regards, Vladis Dronov | Red Hat, Inc. | The Core Kernel | Senior Software Engineer ----- Original Message ----- > From: "Richard Cochran" <richardcochran@gmail.com> > To: "Vladis Dronov" <vdronov@redhat.com> > Cc: linux-fsdevel@vger.kernel.org, "Alexander Viro" <viro@zeniv.linux.org.uk>, "Al Viro" <aviro@redhat.com>, > netdev@vger.kernel.org, linux-kernel@vger.kernel.org > Sent: Friday, December 27, 2019 4:02:19 PM > Subject: Re: [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev > > On Fri, Dec 27, 2019 at 03:26:27AM +0100, Vladis Dronov wrote: > > Here cdev is embedded in posix_clock which is embedded in ptp_clock. > > The race happens because ptp_clock's lifetime is controlled by two > > refcounts: kref and cdev.kobj in posix_clock. This is wrong. > > > > Make ptp_clock's sysfs device a parent of cdev with cdev_device_add() > > created especially for such cases. This way the parent device with its > > ptp_clock is not released until all references to the cdev are released. > > This adds a requirement that an initialized but not exposed struct > > device should be provided to posix_clock_register() by a caller instead > > of a simple dev_t. > > > > This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix > > the race between the release of watchdog_core_data and cdev"). See > > details of the implementation in the commit 233ed09d7fda ("chardev: add > > helper function to register char devs with a struct device"). > > Thanks for digging into this! > > Acked-by: Richard Cochran <richardcochran@gmail.com> > > > /** > > * posix_clock_register() - register a new clock > > - * @clk: Pointer to the clock. Caller must provide 'ops' and 'release' > > - * @devid: Allocated device id > > + * @clk: Pointer to the clock. Caller must provide 'ops' field > > + * @dev: Pointer to the initialized device. Caller must provide > > + * 'release' filed > > field > > Thanks, > Richard ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev 2019-12-27 2:26 ` [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev Vladis Dronov 2019-12-27 15:02 ` Richard Cochran @ 2019-12-31 4:19 ` David Miller 1 sibling, 0 replies; 7+ messages in thread From: David Miller @ 2019-12-31 4:19 UTC (permalink / raw) To: vdronov; +Cc: linux-fsdevel, viro, richardcochran, aviro, netdev, linux-kernel From: Vladis Dronov <vdronov@redhat.com> Date: Fri, 27 Dec 2019 03:26:27 +0100 > In a case when a ptp chardev (like /dev/ptp0) is open but an underlying > device is removed, closing this file leads to a race. This reproduces > easily in a kvm virtual machine: . .. > This happens in: > > static void __fput(struct file *file) > { ... > if (file->f_op->release) > file->f_op->release(inode, file); <<< cdev is kfree'd here > if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL && > !(mode & FMODE_PATH))) { > cdev_put(inode->i_cdev); <<< cdev fields are accessed here > > Namely: > > __fput() > posix_clock_release() > kref_put(&clk->kref, delete_clock) <<< the last reference > delete_clock() > delete_ptp_clock() > kfree(ptp) <<< cdev is embedded in ptp > cdev_put > module_put(p->owner) <<< *p is kfree'd, bang! > > Here cdev is embedded in posix_clock which is embedded in ptp_clock. > The race happens because ptp_clock's lifetime is controlled by two > refcounts: kref and cdev.kobj in posix_clock. This is wrong. > > Make ptp_clock's sysfs device a parent of cdev with cdev_device_add() > created especially for such cases. This way the parent device with its > ptp_clock is not released until all references to the cdev are released. > This adds a requirement that an initialized but not exposed struct > device should be provided to posix_clock_register() by a caller instead > of a simple dev_t. > > This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix > the race between the release of watchdog_core_data and cdev"). See > details of the implementation in the commit 233ed09d7fda ("chardev: add > helper function to register char devs with a struct device"). > > Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u > Analyzed-by: Stephen Johnston <sjohnsto@redhat.com> > Analyzed-by: Vern Lovejoy <vlovejoy@redhat.com> > Signed-off-by: Vladis Dronov <vdronov@redhat.com> Applied, thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-12-31 4:19 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-25 12:53 [PATCH] fs: fix use-after-free in __fput() when a chardev is removed but a file is still open Vladis Dronov 2019-12-08 19:49 ` Al Viro 2019-12-08 19:53 ` Al Viro 2019-12-27 2:26 ` [PATCH v2] ptp: fix the race between the release of ptp_clock and cdev Vladis Dronov 2019-12-27 15:02 ` Richard Cochran 2019-12-27 17:24 ` Vladis Dronov 2019-12-31 4:19 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).