* [PATCH 0/5] Support hot-unbind in IOAT @ 2019-12-16 19:01 Logan Gunthorpe 2019-12-16 19:01 ` [PATCH 1/5] dmaengine: Store module owner in dma_device struct Logan Gunthorpe ` (5 more replies) 0 siblings, 6 replies; 25+ messages in thread From: Logan Gunthorpe @ 2019-12-16 19:01 UTC (permalink / raw) To: linux-kernel, dmaengine, Vinod Koul Cc: Dan Williams, Dave Jiang, Kit Chow, Logan Gunthorpe Hey, This patchset creates some common infrastructure which I will use in the next version of the PLX driver. It adds a reference count to the dma_device struct which is taken and released every time a channel is allocated or freed. A call back is used to allow the driver to free the underlying memory and do any final cleanup. For a use-case, I've adjusted the ioat driver to properly support hot-unbind. The driver was already pretty close as it already had a shutdown state; so it mostly only required freeing the memory correctly and calling ioat_shutdown at the correct time. This patchset is based on v5.5-rc2 and a git branch is available here: https://github.com/sbates130272/linux-p2pmem/ ioat-hot-unbind Thanks, Logan -- Logan Gunthorpe (5): dmaengine: Store module owner in dma_device struct dmaengine: Call module_put() after device_free_chan_resources() dmaengine: Move dma_channel_rebalance() infrastructure up in code dmaengine: Add reference counting to dma_device struct dmaengine: ioat: Support in-use unbind drivers/dma/dmaengine.c | 352 +++++++++++++++++++++----------------- drivers/dma/ioat/init.c | 38 ++-- include/linux/dmaengine.h | 10 +- 3 files changed, 233 insertions(+), 167 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-12-16 19:01 [PATCH 0/5] Support hot-unbind in IOAT Logan Gunthorpe @ 2019-12-16 19:01 ` Logan Gunthorpe 2019-12-16 19:01 ` [PATCH 2/5] dmaengine: Call module_put() after device_free_chan_resources() Logan Gunthorpe ` (4 subsequent siblings) 5 siblings, 0 replies; 25+ messages in thread From: Logan Gunthorpe @ 2019-12-16 19:01 UTC (permalink / raw) To: linux-kernel, dmaengine, Vinod Koul Cc: Dan Williams, Dave Jiang, Kit Chow, Logan Gunthorpe dma_chan_to_owner() dereferences the driver from the struct device to obtain the owner and call module_[get|put](). However, if the backing device is unbound before the dma_device is unregistered, the driver will be cleared and this will cause a NULL pointer dereference. Instead, store a pointer to the owner module in the dma_device struct so the module reference can be properly put when the channel is put, even if the backing device was destroyed first. This change helps to support a safer unbind of DMA engines. If the dma_device is unregistered in the driver's remove function, there's no guarantee that there are no existing clients and a users action may trigger the WARN_ONCE in dma_async_device_unregister() which is unlikely to leave the system in a consistent state. Instead, a better approach is to allow the backing driver to go away and fail any subsequent requests to it. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> --- drivers/dma/dmaengine.c | 4 +++- include/linux/dmaengine.h | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 03ac4b96117c..4b604086b1b3 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -179,7 +179,7 @@ __dma_device_satisfies_mask(struct dma_device *device, static struct module *dma_chan_to_owner(struct dma_chan *chan) { - return chan->device->dev->driver->owner; + return chan->device->owner; } /** @@ -919,6 +919,8 @@ int dma_async_device_register(struct dma_device *device) return -EIO; } + device->owner = device->dev->driver->owner; + if (dma_has_cap(DMA_MEMCPY, device->cap_mask) && !device->device_prep_dma_memcpy) { dev_err(device->dev, "Device claims capability %s, but op is not defined\n", diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 8fcdee1c0cf9..13aa0abb71de 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -674,6 +674,7 @@ struct dma_filter { * @fill_align: alignment shift for memset operations * @dev_id: unique device ID * @dev: struct device reference for dma mapping api + * @owner: owner module (automatically set based on the provided dev) * @src_addr_widths: bit mask of src addr widths the device supports * Width is specified in bytes, e.g. for a device supporting * a width of 4 the mask should have BIT(4) set. @@ -737,6 +738,7 @@ struct dma_device { int dev_id; struct device *dev; + struct module *owner; u32 src_addr_widths; u32 dst_addr_widths; -- 2.20.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 2/5] dmaengine: Call module_put() after device_free_chan_resources() 2019-12-16 19:01 [PATCH 0/5] Support hot-unbind in IOAT Logan Gunthorpe 2019-12-16 19:01 ` [PATCH 1/5] dmaengine: Store module owner in dma_device struct Logan Gunthorpe @ 2019-12-16 19:01 ` Logan Gunthorpe 2019-12-24 4:37 ` Vinod Koul 2019-12-16 19:01 ` [PATCH 3/5] dmaengine: Move dma_channel_rebalance() infrastructure up in code Logan Gunthorpe ` (3 subsequent siblings) 5 siblings, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2019-12-16 19:01 UTC (permalink / raw) To: linux-kernel, dmaengine, Vinod Koul Cc: Dan Williams, Dave Jiang, Kit Chow, Logan Gunthorpe The module reference is taken to ensure the callbacks still exist when they are called. If the channel holds the last reference to the module, the module can disappear before device_free_chan_resources() is called and would cause a call into free'd memory. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> --- drivers/dma/dmaengine.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 4b604086b1b3..776fdf535a3a 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -250,7 +250,6 @@ static void dma_chan_put(struct dma_chan *chan) return; chan->client_count--; - module_put(dma_chan_to_owner(chan)); /* This channel is not in use anymore, free it */ if (!chan->client_count && chan->device->device_free_chan_resources) { @@ -259,6 +258,8 @@ static void dma_chan_put(struct dma_chan *chan) chan->device->device_free_chan_resources(chan); } + module_put(dma_chan_to_owner(chan)); + /* If the channel is used via a DMA request router, free the mapping */ if (chan->router && chan->router->route_free) { chan->router->route_free(chan->router->dev, chan->route_data); -- 2.20.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/5] dmaengine: Call module_put() after device_free_chan_resources() 2019-12-16 19:01 ` [PATCH 2/5] dmaengine: Call module_put() after device_free_chan_resources() Logan Gunthorpe @ 2019-12-24 4:37 ` Vinod Koul 0 siblings, 0 replies; 25+ messages in thread From: Vinod Koul @ 2019-12-24 4:37 UTC (permalink / raw) To: Logan Gunthorpe Cc: linux-kernel, dmaengine, Dan Williams, Dave Jiang, Kit Chow On 16-12-19, 12:01, Logan Gunthorpe wrote: > The module reference is taken to ensure the callbacks still exist > when they are called. If the channel holds the last reference to the > module, the module can disappear before device_free_chan_resources() is > called and would cause a call into free'd memory. > > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> > --- > drivers/dma/dmaengine.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c > index 4b604086b1b3..776fdf535a3a 100644 > --- a/drivers/dma/dmaengine.c > +++ b/drivers/dma/dmaengine.c > @@ -250,7 +250,6 @@ static void dma_chan_put(struct dma_chan *chan) > return; > > chan->client_count--; > - module_put(dma_chan_to_owner(chan)); > > /* This channel is not in use anymore, free it */ > if (!chan->client_count && chan->device->device_free_chan_resources) { > @@ -259,6 +258,8 @@ static void dma_chan_put(struct dma_chan *chan) > chan->device->device_free_chan_resources(chan); > } > > + module_put(dma_chan_to_owner(chan)); > + > /* If the channel is used via a DMA request router, free the mapping */ > if (chan->router && chan->router->route_free) { > chan->router->route_free(chan->router->dev, chan->route_data); I think this should be moved here after route_free() and will take care of route cleanup as well Thanks -- ~Vinod ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 3/5] dmaengine: Move dma_channel_rebalance() infrastructure up in code 2019-12-16 19:01 [PATCH 0/5] Support hot-unbind in IOAT Logan Gunthorpe 2019-12-16 19:01 ` [PATCH 1/5] dmaengine: Store module owner in dma_device struct Logan Gunthorpe 2019-12-16 19:01 ` [PATCH 2/5] dmaengine: Call module_put() after device_free_chan_resources() Logan Gunthorpe @ 2019-12-16 19:01 ` Logan Gunthorpe 2019-12-16 19:01 ` [PATCH 4/5] dmaengine: Add reference counting to dma_device struct Logan Gunthorpe ` (2 subsequent siblings) 5 siblings, 0 replies; 25+ messages in thread From: Logan Gunthorpe @ 2019-12-16 19:01 UTC (permalink / raw) To: linux-kernel, dmaengine, Vinod Koul Cc: Dan Williams, Dave Jiang, Kit Chow, Logan Gunthorpe So it can be called by a release function which is needed higher up in the code. No functional changes intended. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> --- drivers/dma/dmaengine.c | 288 ++++++++++++++++++++-------------------- 1 file changed, 144 insertions(+), 144 deletions(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 776fdf535a3a..1f9a6293f15a 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -164,6 +164,150 @@ static struct class dma_devclass = { /* --- client and device registration --- */ +/** + * dma_cap_mask_all - enable iteration over all operation types + */ +static dma_cap_mask_t dma_cap_mask_all; + +/** + * dma_chan_tbl_ent - tracks channel allocations per core/operation + * @chan - associated channel for this entry + */ +struct dma_chan_tbl_ent { + struct dma_chan *chan; +}; + +/** + * channel_table - percpu lookup table for memory-to-memory offload providers + */ +static struct dma_chan_tbl_ent __percpu *channel_table[DMA_TX_TYPE_END]; + +static int __init dma_channel_table_init(void) +{ + enum dma_transaction_type cap; + int err = 0; + + bitmap_fill(dma_cap_mask_all.bits, DMA_TX_TYPE_END); + + /* 'interrupt', 'private', and 'slave' are channel capabilities, + * but are not associated with an operation so they do not need + * an entry in the channel_table + */ + clear_bit(DMA_INTERRUPT, dma_cap_mask_all.bits); + clear_bit(DMA_PRIVATE, dma_cap_mask_all.bits); + clear_bit(DMA_SLAVE, dma_cap_mask_all.bits); + + for_each_dma_cap_mask(cap, dma_cap_mask_all) { + channel_table[cap] = alloc_percpu(struct dma_chan_tbl_ent); + if (!channel_table[cap]) { + err = -ENOMEM; + break; + } + } + + if (err) { + pr_err("initialization failure\n"); + for_each_dma_cap_mask(cap, dma_cap_mask_all) + free_percpu(channel_table[cap]); + } + + return err; +} +arch_initcall(dma_channel_table_init); + +/** + * dma_chan_is_local - returns true if the channel is in the same numa-node as + * the cpu + */ +static bool dma_chan_is_local(struct dma_chan *chan, int cpu) +{ + int node = dev_to_node(chan->device->dev); + return node == NUMA_NO_NODE || + cpumask_test_cpu(cpu, cpumask_of_node(node)); +} + +/** + * min_chan - returns the channel with min count and in the same numa-node as + * the cpu + * @cap: capability to match + * @cpu: cpu index which the channel should be close to + * + * If some channels are close to the given cpu, the one with the lowest + * reference count is returned. Otherwise, cpu is ignored and only the + * reference count is taken into account. + * Must be called under dma_list_mutex. + */ +static struct dma_chan *min_chan(enum dma_transaction_type cap, int cpu) +{ + struct dma_device *device; + struct dma_chan *chan; + struct dma_chan *min = NULL; + struct dma_chan *localmin = NULL; + + list_for_each_entry(device, &dma_device_list, global_node) { + if (!dma_has_cap(cap, device->cap_mask) || + dma_has_cap(DMA_PRIVATE, device->cap_mask)) + continue; + list_for_each_entry(chan, &device->channels, device_node) { + if (!chan->client_count) + continue; + if (!min || chan->table_count < min->table_count) + min = chan; + + if (dma_chan_is_local(chan, cpu)) + if (!localmin || + chan->table_count < localmin->table_count) + localmin = chan; + } + } + + chan = localmin ? localmin : min; + + if (chan) + chan->table_count++; + + return chan; +} + +/** + * dma_channel_rebalance - redistribute the available channels + * + * Optimize for cpu isolation (each cpu gets a dedicated channel for an + * operation type) in the SMP case, and operation isolation (avoid + * multi-tasking channels) in the non-SMP case. Must be called under + * dma_list_mutex. + */ +static void dma_channel_rebalance(void) +{ + struct dma_chan *chan; + struct dma_device *device; + int cpu; + int cap; + + /* undo the last distribution */ + for_each_dma_cap_mask(cap, dma_cap_mask_all) + for_each_possible_cpu(cpu) + per_cpu_ptr(channel_table[cap], cpu)->chan = NULL; + + list_for_each_entry(device, &dma_device_list, global_node) { + if (dma_has_cap(DMA_PRIVATE, device->cap_mask)) + continue; + list_for_each_entry(chan, &device->channels, device_node) + chan->table_count = 0; + } + + /* don't populate the channel_table if no clients are available */ + if (!dmaengine_ref_count) + return; + + /* redistribute available channels */ + for_each_dma_cap_mask(cap, dma_cap_mask_all) + for_each_online_cpu(cpu) { + chan = min_chan(cap, cpu); + per_cpu_ptr(channel_table[cap], cpu)->chan = chan; + } +} + #define dma_device_satisfies_mask(device, mask) \ __dma_device_satisfies_mask((device), &(mask)) static int @@ -289,57 +433,6 @@ enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie) } EXPORT_SYMBOL(dma_sync_wait); -/** - * dma_cap_mask_all - enable iteration over all operation types - */ -static dma_cap_mask_t dma_cap_mask_all; - -/** - * dma_chan_tbl_ent - tracks channel allocations per core/operation - * @chan - associated channel for this entry - */ -struct dma_chan_tbl_ent { - struct dma_chan *chan; -}; - -/** - * channel_table - percpu lookup table for memory-to-memory offload providers - */ -static struct dma_chan_tbl_ent __percpu *channel_table[DMA_TX_TYPE_END]; - -static int __init dma_channel_table_init(void) -{ - enum dma_transaction_type cap; - int err = 0; - - bitmap_fill(dma_cap_mask_all.bits, DMA_TX_TYPE_END); - - /* 'interrupt', 'private', and 'slave' are channel capabilities, - * but are not associated with an operation so they do not need - * an entry in the channel_table - */ - clear_bit(DMA_INTERRUPT, dma_cap_mask_all.bits); - clear_bit(DMA_PRIVATE, dma_cap_mask_all.bits); - clear_bit(DMA_SLAVE, dma_cap_mask_all.bits); - - for_each_dma_cap_mask(cap, dma_cap_mask_all) { - channel_table[cap] = alloc_percpu(struct dma_chan_tbl_ent); - if (!channel_table[cap]) { - err = -ENOMEM; - break; - } - } - - if (err) { - pr_err("initialization failure\n"); - for_each_dma_cap_mask(cap, dma_cap_mask_all) - free_percpu(channel_table[cap]); - } - - return err; -} -arch_initcall(dma_channel_table_init); - /** * dma_find_channel - find a channel to carry out the operation * @tx_type: transaction type @@ -370,97 +463,6 @@ void dma_issue_pending_all(void) } EXPORT_SYMBOL(dma_issue_pending_all); -/** - * dma_chan_is_local - returns true if the channel is in the same numa-node as the cpu - */ -static bool dma_chan_is_local(struct dma_chan *chan, int cpu) -{ - int node = dev_to_node(chan->device->dev); - return node == NUMA_NO_NODE || - cpumask_test_cpu(cpu, cpumask_of_node(node)); -} - -/** - * min_chan - returns the channel with min count and in the same numa-node as the cpu - * @cap: capability to match - * @cpu: cpu index which the channel should be close to - * - * If some channels are close to the given cpu, the one with the lowest - * reference count is returned. Otherwise, cpu is ignored and only the - * reference count is taken into account. - * Must be called under dma_list_mutex. - */ -static struct dma_chan *min_chan(enum dma_transaction_type cap, int cpu) -{ - struct dma_device *device; - struct dma_chan *chan; - struct dma_chan *min = NULL; - struct dma_chan *localmin = NULL; - - list_for_each_entry(device, &dma_device_list, global_node) { - if (!dma_has_cap(cap, device->cap_mask) || - dma_has_cap(DMA_PRIVATE, device->cap_mask)) - continue; - list_for_each_entry(chan, &device->channels, device_node) { - if (!chan->client_count) - continue; - if (!min || chan->table_count < min->table_count) - min = chan; - - if (dma_chan_is_local(chan, cpu)) - if (!localmin || - chan->table_count < localmin->table_count) - localmin = chan; - } - } - - chan = localmin ? localmin : min; - - if (chan) - chan->table_count++; - - return chan; -} - -/** - * dma_channel_rebalance - redistribute the available channels - * - * Optimize for cpu isolation (each cpu gets a dedicated channel for an - * operation type) in the SMP case, and operation isolation (avoid - * multi-tasking channels) in the non-SMP case. Must be called under - * dma_list_mutex. - */ -static void dma_channel_rebalance(void) -{ - struct dma_chan *chan; - struct dma_device *device; - int cpu; - int cap; - - /* undo the last distribution */ - for_each_dma_cap_mask(cap, dma_cap_mask_all) - for_each_possible_cpu(cpu) - per_cpu_ptr(channel_table[cap], cpu)->chan = NULL; - - list_for_each_entry(device, &dma_device_list, global_node) { - if (dma_has_cap(DMA_PRIVATE, device->cap_mask)) - continue; - list_for_each_entry(chan, &device->channels, device_node) - chan->table_count = 0; - } - - /* don't populate the channel_table if no clients are available */ - if (!dmaengine_ref_count) - return; - - /* redistribute available channels */ - for_each_dma_cap_mask(cap, dma_cap_mask_all) - for_each_online_cpu(cpu) { - chan = min_chan(cap, cpu); - per_cpu_ptr(channel_table[cap], cpu)->chan = chan; - } -} - int dma_get_slave_caps(struct dma_chan *chan, struct dma_slave_caps *caps) { struct dma_device *device; @@ -1376,5 +1378,3 @@ static int __init dma_bus_init(void) return class_register(&dma_devclass); } arch_initcall(dma_bus_init); - - -- 2.20.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 4/5] dmaengine: Add reference counting to dma_device struct 2019-12-16 19:01 [PATCH 0/5] Support hot-unbind in IOAT Logan Gunthorpe ` (2 preceding siblings ...) 2019-12-16 19:01 ` [PATCH 3/5] dmaengine: Move dma_channel_rebalance() infrastructure up in code Logan Gunthorpe @ 2019-12-16 19:01 ` Logan Gunthorpe 2019-12-16 19:01 ` [PATCH 5/5] dmaengine: ioat: Support in-use unbind Logan Gunthorpe 2019-12-24 4:50 ` [PATCH 0/5] Support hot-unbind in IOAT Vinod Koul 5 siblings, 0 replies; 25+ messages in thread From: Logan Gunthorpe @ 2019-12-16 19:01 UTC (permalink / raw) To: linux-kernel, dmaengine, Vinod Koul Cc: Dan Williams, Dave Jiang, Kit Chow, Logan Gunthorpe Adding a reference count helps drivers to properly implement the unbind while in use case. References are taken and put every time a channel is allocated or freed. Once the final reference is put, the device is removed from the dma_device_list and a release callback function is called to signal the driver to free the memory. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> --- drivers/dma/dmaengine.c | 57 +++++++++++++++++++++++++++++++++------ include/linux/dmaengine.h | 8 +++++- 2 files changed, 56 insertions(+), 9 deletions(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 1f9a6293f15a..e316abe3672d 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -342,6 +342,23 @@ static void balance_ref_count(struct dma_chan *chan) } } +static void dma_device_release(struct kref *ref) +{ + struct dma_device *device = container_of(ref, struct dma_device, ref); + + list_del_rcu(&device->global_node); + dma_channel_rebalance(); + + if (device->device_release) + device->device_release(device); +} + +static void dma_device_put(struct dma_device *device) +{ + lockdep_assert_held(&dma_list_mutex); + kref_put(&device->ref, dma_device_release); +} + /** * dma_chan_get - try to grab a dma channel's parent driver module * @chan - channel to grab @@ -362,6 +379,12 @@ static int dma_chan_get(struct dma_chan *chan) if (!try_module_get(owner)) return -ENODEV; + ret = kref_get_unless_zero(&chan->device->ref); + if (!ret) { + ret = -ENODEV; + goto module_put_out; + } + /* allocate upon first client reference */ if (chan->device->device_alloc_chan_resources) { ret = chan->device->device_alloc_chan_resources(chan); @@ -377,6 +400,8 @@ static int dma_chan_get(struct dma_chan *chan) return 0; err_out: + dma_device_put(chan->device); +module_put_out: module_put(owner); return ret; } @@ -402,6 +427,7 @@ static void dma_chan_put(struct dma_chan *chan) chan->device->device_free_chan_resources(chan); } + dma_device_put(chan->device); module_put(dma_chan_to_owner(chan)); /* If the channel is used via a DMA request router, free the mapping */ @@ -837,14 +863,14 @@ EXPORT_SYMBOL(dmaengine_get); */ void dmaengine_put(void) { - struct dma_device *device; + struct dma_device *device, *_d; struct dma_chan *chan; mutex_lock(&dma_list_mutex); dmaengine_ref_count--; BUG_ON(dmaengine_ref_count < 0); /* drop channel references */ - list_for_each_entry(device, &dma_device_list, global_node) { + list_for_each_entry_safe(device, _d, &dma_device_list, global_node) { if (dma_has_cap(DMA_PRIVATE, device->cap_mask)) continue; list_for_each_entry(chan, &device->channels, device_node) @@ -906,6 +932,10 @@ static int get_dma_id(struct dma_device *device) /** * dma_async_device_register - registers DMA devices found * @device: &dma_device + * + * After calling this routine the structure should not be freed except in the + * device_release() callback which will be called after + * dma_async_device_unregister() is called and no further references are taken. */ int dma_async_device_register(struct dma_device *device) { @@ -999,6 +1029,12 @@ int dma_async_device_register(struct dma_device *device) return -EIO; } + if (!device->device_release) + dev_warn(device->dev, + "WARN: Device release is not defined so it is not safe to unbind this driver while in use\n"); + + kref_init(&device->ref); + /* note: this only matters in the * CONFIG_ASYNC_TX_ENABLE_CHANNEL_SWITCH=n case */ @@ -1115,13 +1151,8 @@ void dma_async_device_unregister(struct dma_device *device) { struct dma_chan *chan; - mutex_lock(&dma_list_mutex); - list_del_rcu(&device->global_node); - dma_channel_rebalance(); - mutex_unlock(&dma_list_mutex); - list_for_each_entry(chan, &device->channels, device_node) { - WARN_ONCE(chan->client_count, + WARN_ONCE(!device->device_release && chan->client_count, "%s called while %d clients hold a reference\n", __func__, chan->client_count); mutex_lock(&dma_list_mutex); @@ -1130,6 +1161,16 @@ void dma_async_device_unregister(struct dma_device *device) device_unregister(&chan->dev->device); free_percpu(chan->local); } + + mutex_lock(&dma_list_mutex); + /* + * setting DMA_PRIVATE ensures the device being torn down will not + * be used in the channel_table + */ + dma_cap_set(DMA_PRIVATE, device->cap_mask); + dma_channel_rebalance(); + dma_device_put(device); + mutex_unlock(&dma_list_mutex); } EXPORT_SYMBOL(dma_async_device_unregister); diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 13aa0abb71de..61e59a544af2 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -719,9 +719,14 @@ struct dma_filter { * will just return a simple status code * @device_issue_pending: push pending transactions to hardware * @descriptor_reuse: a submitted transfer can be resubmitted after completion + * @device_release: called sometime atfer dma_async_device_unregister() is + * called and there are no further references to this structure. This + * must be implemented to free resources however many existing drivers + * do not and are therefore not safe to unbind while in use. + * */ struct dma_device { - + struct kref ref; unsigned int chancnt; unsigned int privatecnt; struct list_head channels; @@ -802,6 +807,7 @@ struct dma_device { dma_cookie_t cookie, struct dma_tx_state *txstate); void (*device_issue_pending)(struct dma_chan *chan); + void (*device_release)(struct dma_device *dev); }; static inline int dmaengine_slave_config(struct dma_chan *chan, -- 2.20.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 5/5] dmaengine: ioat: Support in-use unbind 2019-12-16 19:01 [PATCH 0/5] Support hot-unbind in IOAT Logan Gunthorpe ` (3 preceding siblings ...) 2019-12-16 19:01 ` [PATCH 4/5] dmaengine: Add reference counting to dma_device struct Logan Gunthorpe @ 2019-12-16 19:01 ` Logan Gunthorpe 2019-12-17 16:33 ` Dave Jiang 2019-12-24 4:50 ` [PATCH 0/5] Support hot-unbind in IOAT Vinod Koul 5 siblings, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2019-12-16 19:01 UTC (permalink / raw) To: linux-kernel, dmaengine, Vinod Koul Cc: Dan Williams, Dave Jiang, Kit Chow, Logan Gunthorpe Don't allocate memory using the devm infrastructure and instead call kfree with the new dmaengine device_release call back. This ensures the structures are available until the last reference is dropped. We also need to ensure we call ioat_shutdown() in ioat_remove() so that all the channels are quiesced and further transaction fails. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> --- drivers/dma/ioat/init.c | 38 ++++++++++++++++++++++++++------------ 1 file changed, 26 insertions(+), 12 deletions(-) diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c index a6a6dc432db8..60e9afbb896c 100644 --- a/drivers/dma/ioat/init.c +++ b/drivers/dma/ioat/init.c @@ -556,10 +556,6 @@ static void ioat_dma_remove(struct ioatdma_device *ioat_dma) ioat_kobject_del(ioat_dma); dma_async_device_unregister(dma); - - dma_pool_destroy(ioat_dma->completion_pool); - - INIT_LIST_HEAD(&dma->channels); } /** @@ -589,7 +585,7 @@ static void ioat_enumerate_channels(struct ioatdma_device *ioat_dma) dev_dbg(dev, "%s: xfercap = %d\n", __func__, 1 << xfercap_log); for (i = 0; i < dma->chancnt; i++) { - ioat_chan = devm_kzalloc(dev, sizeof(*ioat_chan), GFP_KERNEL); + ioat_chan = kzalloc(sizeof(*ioat_chan), GFP_KERNEL); if (!ioat_chan) break; @@ -624,12 +620,16 @@ static void ioat_free_chan_resources(struct dma_chan *c) return; ioat_stop(ioat_chan); - ioat_reset_hw(ioat_chan); - /* Put LTR to idle */ - if (ioat_dma->version >= IOAT_VER_3_4) - writeb(IOAT_CHAN_LTR_SWSEL_IDLE, - ioat_chan->reg_base + IOAT_CHAN_LTR_SWSEL_OFFSET); + if (!test_bit(IOAT_CHAN_DOWN, &ioat_chan->state)) { + ioat_reset_hw(ioat_chan); + + /* Put LTR to idle */ + if (ioat_dma->version >= IOAT_VER_3_4) + writeb(IOAT_CHAN_LTR_SWSEL_IDLE, + ioat_chan->reg_base + + IOAT_CHAN_LTR_SWSEL_OFFSET); + } spin_lock_bh(&ioat_chan->cleanup_lock); spin_lock_bh(&ioat_chan->prep_lock); @@ -1322,16 +1322,28 @@ static struct pci_driver ioat_pci_driver = { .err_handler = &ioat_err_handler, }; +static void release_ioatdma(struct dma_device *device) +{ + struct ioatdma_device *d = to_ioatdma_device(device); + int i; + + for (i = 0; i < IOAT_MAX_CHANS; i++) + kfree(d->idx[i]); + + dma_pool_destroy(d->completion_pool); + kfree(d); +} + static struct ioatdma_device * alloc_ioatdma(struct pci_dev *pdev, void __iomem *iobase) { - struct device *dev = &pdev->dev; - struct ioatdma_device *d = devm_kzalloc(dev, sizeof(*d), GFP_KERNEL); + struct ioatdma_device *d = kzalloc(sizeof(*d), GFP_KERNEL); if (!d) return NULL; d->pdev = pdev; d->reg_base = iobase; + d->dma_dev.device_release = release_ioatdma; return d; } @@ -1400,6 +1412,8 @@ static void ioat_remove(struct pci_dev *pdev) if (!device) return; + ioat_shutdown(pdev); + dev_err(&pdev->dev, "Removing dma and dca services\n"); if (device->dca) { unregister_dca_provider(device->dca, &pdev->dev); -- 2.20.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 5/5] dmaengine: ioat: Support in-use unbind 2019-12-16 19:01 ` [PATCH 5/5] dmaengine: ioat: Support in-use unbind Logan Gunthorpe @ 2019-12-17 16:33 ` Dave Jiang 0 siblings, 0 replies; 25+ messages in thread From: Dave Jiang @ 2019-12-17 16:33 UTC (permalink / raw) To: Logan Gunthorpe, linux-kernel, dmaengine, Vinod Koul Cc: Williams, Dan J, Kit Chow On 12/16/19 12:01 PM, Logan Gunthorpe wrote: > Don't allocate memory using the devm infrastructure and instead call > kfree with the new dmaengine device_release call back. This ensures > the structures are available until the last reference is dropped. > > We also need to ensure we call ioat_shutdown() in ioat_remove() so > that all the channels are quiesced and further transaction fails. > > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/dma/ioat/init.c | 38 ++++++++++++++++++++++++++------------ > 1 file changed, 26 insertions(+), 12 deletions(-) > > diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c > index a6a6dc432db8..60e9afbb896c 100644 > --- a/drivers/dma/ioat/init.c > +++ b/drivers/dma/ioat/init.c > @@ -556,10 +556,6 @@ static void ioat_dma_remove(struct ioatdma_device *ioat_dma) > ioat_kobject_del(ioat_dma); > > dma_async_device_unregister(dma); > - > - dma_pool_destroy(ioat_dma->completion_pool); > - > - INIT_LIST_HEAD(&dma->channels); > } > > /** > @@ -589,7 +585,7 @@ static void ioat_enumerate_channels(struct ioatdma_device *ioat_dma) > dev_dbg(dev, "%s: xfercap = %d\n", __func__, 1 << xfercap_log); > > for (i = 0; i < dma->chancnt; i++) { > - ioat_chan = devm_kzalloc(dev, sizeof(*ioat_chan), GFP_KERNEL); > + ioat_chan = kzalloc(sizeof(*ioat_chan), GFP_KERNEL); > if (!ioat_chan) > break; > > @@ -624,12 +620,16 @@ static void ioat_free_chan_resources(struct dma_chan *c) > return; > > ioat_stop(ioat_chan); > - ioat_reset_hw(ioat_chan); > > - /* Put LTR to idle */ > - if (ioat_dma->version >= IOAT_VER_3_4) > - writeb(IOAT_CHAN_LTR_SWSEL_IDLE, > - ioat_chan->reg_base + IOAT_CHAN_LTR_SWSEL_OFFSET); > + if (!test_bit(IOAT_CHAN_DOWN, &ioat_chan->state)) { > + ioat_reset_hw(ioat_chan); > + > + /* Put LTR to idle */ > + if (ioat_dma->version >= IOAT_VER_3_4) > + writeb(IOAT_CHAN_LTR_SWSEL_IDLE, > + ioat_chan->reg_base + > + IOAT_CHAN_LTR_SWSEL_OFFSET); > + } > > spin_lock_bh(&ioat_chan->cleanup_lock); > spin_lock_bh(&ioat_chan->prep_lock); > @@ -1322,16 +1322,28 @@ static struct pci_driver ioat_pci_driver = { > .err_handler = &ioat_err_handler, > }; > > +static void release_ioatdma(struct dma_device *device) > +{ > + struct ioatdma_device *d = to_ioatdma_device(device); > + int i; > + > + for (i = 0; i < IOAT_MAX_CHANS; i++) > + kfree(d->idx[i]); > + > + dma_pool_destroy(d->completion_pool); > + kfree(d); > +} > + > static struct ioatdma_device * > alloc_ioatdma(struct pci_dev *pdev, void __iomem *iobase) > { > - struct device *dev = &pdev->dev; > - struct ioatdma_device *d = devm_kzalloc(dev, sizeof(*d), GFP_KERNEL); > + struct ioatdma_device *d = kzalloc(sizeof(*d), GFP_KERNEL); > > if (!d) > return NULL; > d->pdev = pdev; > d->reg_base = iobase; > + d->dma_dev.device_release = release_ioatdma; > return d; > } > > @@ -1400,6 +1412,8 @@ static void ioat_remove(struct pci_dev *pdev) > if (!device) > return; > > + ioat_shutdown(pdev); > + > dev_err(&pdev->dev, "Removing dma and dca services\n"); > if (device->dca) { > unregister_dca_provider(device->dca, &pdev->dev); > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/5] Support hot-unbind in IOAT 2019-12-16 19:01 [PATCH 0/5] Support hot-unbind in IOAT Logan Gunthorpe ` (4 preceding siblings ...) 2019-12-16 19:01 ` [PATCH 5/5] dmaengine: ioat: Support in-use unbind Logan Gunthorpe @ 2019-12-24 4:50 ` Vinod Koul 2019-12-25 2:25 ` Logan Gunthorpe 5 siblings, 1 reply; 25+ messages in thread From: Vinod Koul @ 2019-12-24 4:50 UTC (permalink / raw) To: Logan Gunthorpe Cc: linux-kernel, dmaengine, Dan Williams, Dave Jiang, Kit Chow On 16-12-19, 12:01, Logan Gunthorpe wrote: > Hey, > > This patchset creates some common infrastructure which I will use in the > next version of the PLX driver. It adds a reference count to the > dma_device struct which is taken and released every time a channel > is allocated or freed. A call back is used to allow the driver to > free the underlying memory and do any final cleanup. > > For a use-case, I've adjusted the ioat driver to properly support > hot-unbind. The driver was already pretty close as it already had > a shutdown state; so it mostly only required freeing the memory > correctly and calling ioat_shutdown at the correct time. I didnt find anything else (apart from one change i pointed), so I have applied this and will fix the comment. No point in delaying this -- ~Vinod ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/5] Support hot-unbind in IOAT 2019-12-24 4:50 ` [PATCH 0/5] Support hot-unbind in IOAT Vinod Koul @ 2019-12-25 2:25 ` Logan Gunthorpe 0 siblings, 0 replies; 25+ messages in thread From: Logan Gunthorpe @ 2019-12-25 2:25 UTC (permalink / raw) To: Vinod Koul; +Cc: linux-kernel, dmaengine, Dan Williams, Dave Jiang, Kit Chow On 2019-12-23 9:50 p.m., Vinod Koul wrote: > On 16-12-19, 12:01, Logan Gunthorpe wrote: >> Hey, >> >> This patchset creates some common infrastructure which I will use in the >> next version of the PLX driver. It adds a reference count to the >> dma_device struct which is taken and released every time a channel >> is allocated or freed. A call back is used to allow the driver to >> free the underlying memory and do any final cleanup. >> >> For a use-case, I've adjusted the ioat driver to properly support >> hot-unbind. The driver was already pretty close as it already had >> a shutdown state; so it mostly only required freeing the memory >> correctly and calling ioat_shutdown at the correct time. > > I didnt find anything else (apart from one change i pointed), so I have > applied this and will fix the comment. No point in delaying this Great, thanks! Logan ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 0/5] PLX Switch DMA Engine Driver @ 2019-10-22 21:46 Logan Gunthorpe 2019-10-22 21:46 ` [PATCH 1/5] dmaengine: Store module owner in dma_device struct Logan Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2019-10-22 21:46 UTC (permalink / raw) To: linux-kernel, dmaengine, Vinod Koul; +Cc: Dan Williams, Logan Gunthorpe Hi, The following patches introduce a driver to use the DMA hardware inside PLX ExpressLane PEX Switches. The DMA devices appear as one or more PCI virtual functions per channel and can serve similar use cases as Intel's IOAT driver. The first two patches in this series fix some bugs that are required to support cases where the driver is unbound while the DMA engine is in use. Without these patches the kernel will panic when that happens. The final three patches implement the driver itself. This patchset is based off of v5.4-rc4 and a git branch is available here: https://github.com/sbates130272/linux-p2pmem/ plx-dma Thanks, Logan -- Logan Gunthorpe (5): dmaengine: Store module owner in dma_device struct dmaengine: Call module_put() after device_free_chan_resources() dmaengine: plx-dma: Introduce PLX DMA engine PCI driver skeleton dmaengine: plx-dma: Implement hardware initialization and cleanup dmaengine: plx-dma: Implement descriptor submission MAINTAINERS | 5 + drivers/dma/Kconfig | 9 + drivers/dma/Makefile | 1 + drivers/dma/dmaengine.c | 7 +- drivers/dma/plx_dma.c | 658 ++++++++++++++++++++++++++++++++++++++ include/linux/dmaengine.h | 2 + 6 files changed, 680 insertions(+), 2 deletions(-) create mode 100644 drivers/dma/plx_dma.c -- 2.20.1 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-10-22 21:46 [PATCH 0/5] PLX Switch DMA Engine Driver Logan Gunthorpe @ 2019-10-22 21:46 ` Logan Gunthorpe 2019-11-09 17:18 ` Vinod Koul 0 siblings, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2019-10-22 21:46 UTC (permalink / raw) To: linux-kernel, dmaengine, Vinod Koul; +Cc: Dan Williams, Logan Gunthorpe dma_chan_to_owner() dereferences the driver from the struct device to obtain the owner and call module_[get|put](). However, if the backing device is unbound before the dma_device is unregistered, the driver will be cleared and this will cause a NULL pointer dereference. Instead, store a pointer to the owner module in the dma_device struct so the module reference can be properly put when the channel is put, even if the backing device was destroyed first. This change helps to support a safer unbind of DMA engines. If the dma_device is unregistered in the driver's remove function, there's no guarantee that there are no existing clients and a users action may trigger the WARN_ONCE in dma_async_device_unregister() which is unlikely to leave the system in a consistent state. Instead, a better approach is to allow the backing driver to go away and fail any subsequent requests to it. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> --- drivers/dma/dmaengine.c | 4 +++- include/linux/dmaengine.h | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 03ac4b96117c..4b604086b1b3 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -179,7 +179,7 @@ __dma_device_satisfies_mask(struct dma_device *device, static struct module *dma_chan_to_owner(struct dma_chan *chan) { - return chan->device->dev->driver->owner; + return chan->device->owner; } /** @@ -919,6 +919,8 @@ int dma_async_device_register(struct dma_device *device) return -EIO; } + device->owner = device->dev->driver->owner; + if (dma_has_cap(DMA_MEMCPY, device->cap_mask) && !device->device_prep_dma_memcpy) { dev_err(device->dev, "Device claims capability %s, but op is not defined\n", diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 8fcdee1c0cf9..13aa0abb71de 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -674,6 +674,7 @@ struct dma_filter { * @fill_align: alignment shift for memset operations * @dev_id: unique device ID * @dev: struct device reference for dma mapping api + * @owner: owner module (automatically set based on the provided dev) * @src_addr_widths: bit mask of src addr widths the device supports * Width is specified in bytes, e.g. for a device supporting * a width of 4 the mask should have BIT(4) set. @@ -737,6 +738,7 @@ struct dma_device { int dev_id; struct device *dev; + struct module *owner; u32 src_addr_widths; u32 dst_addr_widths; -- 2.20.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-10-22 21:46 ` [PATCH 1/5] dmaengine: Store module owner in dma_device struct Logan Gunthorpe @ 2019-11-09 17:18 ` Vinod Koul 2019-11-11 16:50 ` Logan Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Vinod Koul @ 2019-11-09 17:18 UTC (permalink / raw) To: Logan Gunthorpe; +Cc: linux-kernel, dmaengine, Dan Williams Hi Logan, Sorry for delay in reply! On 22-10-19, 15:46, Logan Gunthorpe wrote: > dma_chan_to_owner() dereferences the driver from the struct device to > obtain the owner and call module_[get|put](). However, if the backing > device is unbound before the dma_device is unregistered, the driver > will be cleared and this will cause a NULL pointer dereference. Have you been able to repro this? If so how..? The expectation is that the driver shall unregister before removed. > > Instead, store a pointer to the owner module in the dma_device struct > so the module reference can be properly put when the channel is put, even > if the backing device was destroyed first. > > This change helps to support a safer unbind of DMA engines. For error cases which should be fixed, so maybe this is a right way and gets things fixed :) > If the dma_device is unregistered in the driver's remove function, > there's no guarantee that there are no existing clients and a users > action may trigger the WARN_ONCE in dma_async_device_unregister() > which is unlikely to leave the system in a consistent state. > Instead, a better approach is to allow the backing driver to go away > and fail any subsequent requests to it. > > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> > --- > drivers/dma/dmaengine.c | 4 +++- > include/linux/dmaengine.h | 2 ++ > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c > index 03ac4b96117c..4b604086b1b3 100644 > --- a/drivers/dma/dmaengine.c > +++ b/drivers/dma/dmaengine.c > @@ -179,7 +179,7 @@ __dma_device_satisfies_mask(struct dma_device *device, > > static struct module *dma_chan_to_owner(struct dma_chan *chan) > { > - return chan->device->dev->driver->owner; > + return chan->device->owner; > } > > /** > @@ -919,6 +919,8 @@ int dma_async_device_register(struct dma_device *device) > return -EIO; > } > > + device->owner = device->dev->driver->owner; > + > if (dma_has_cap(DMA_MEMCPY, device->cap_mask) && !device->device_prep_dma_memcpy) { > dev_err(device->dev, > "Device claims capability %s, but op is not defined\n", > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h > index 8fcdee1c0cf9..13aa0abb71de 100644 > --- a/include/linux/dmaengine.h > +++ b/include/linux/dmaengine.h > @@ -674,6 +674,7 @@ struct dma_filter { > * @fill_align: alignment shift for memset operations > * @dev_id: unique device ID > * @dev: struct device reference for dma mapping api > + * @owner: owner module (automatically set based on the provided dev) > * @src_addr_widths: bit mask of src addr widths the device supports > * Width is specified in bytes, e.g. for a device supporting > * a width of 4 the mask should have BIT(4) set. > @@ -737,6 +738,7 @@ struct dma_device { > > int dev_id; > struct device *dev; > + struct module *owner; > > u32 src_addr_widths; > u32 dst_addr_widths; > -- > 2.20.1 -- ~Vinod ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-09 17:18 ` Vinod Koul @ 2019-11-11 16:50 ` Logan Gunthorpe 2019-11-12 5:56 ` Vinod Koul 0 siblings, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2019-11-11 16:50 UTC (permalink / raw) To: Vinod Koul; +Cc: linux-kernel, dmaengine, Dan Williams On 2019-11-09 10:18 a.m., Vinod Koul wrote: > Hi Logan, > > Sorry for delay in reply! > > On 22-10-19, 15:46, Logan Gunthorpe wrote: >> dma_chan_to_owner() dereferences the driver from the struct device to >> obtain the owner and call module_[get|put](). However, if the backing >> device is unbound before the dma_device is unregistered, the driver >> will be cleared and this will cause a NULL pointer dereference. > > Have you been able to repro this? If so how..? > > The expectation is that the driver shall unregister before removed. Yes, with my new driver, if I do a PCI unbind (which unregisters) while the DMA engine is in use, it panics. The point is the underlying driver can go away before the channel is removed. I suspect this is less of an issue for most devices as they wouldn't normally be unbound while in use (for example there's really no reason to ever unbind IOAT seeing it's built into the system). Though, the fact is, the user could unbind these devices at anytime and we don't want to panic if they do. >> >> Instead, store a pointer to the owner module in the dma_device struct >> so the module reference can be properly put when the channel is put, even >> if the backing device was destroyed first. >> >> This change helps to support a safer unbind of DMA engines. > > For error cases which should be fixed, so maybe this is a right way and > gets things fixed :) Yes, if you'd like to merge the first two patches ahead of the rest, that would make sense to me. Thanks, Logan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-11 16:50 ` Logan Gunthorpe @ 2019-11-12 5:56 ` Vinod Koul 2019-11-12 16:45 ` Logan Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Vinod Koul @ 2019-11-12 5:56 UTC (permalink / raw) To: Logan Gunthorpe; +Cc: linux-kernel, dmaengine, Dan Williams On 11-11-19, 09:50, Logan Gunthorpe wrote: > > > On 2019-11-09 10:18 a.m., Vinod Koul wrote: > > Hi Logan, > > > > Sorry for delay in reply! > > > > On 22-10-19, 15:46, Logan Gunthorpe wrote: > >> dma_chan_to_owner() dereferences the driver from the struct device to > >> obtain the owner and call module_[get|put](). However, if the backing > >> device is unbound before the dma_device is unregistered, the driver > >> will be cleared and this will cause a NULL pointer dereference. > > > > Have you been able to repro this? If so how..? > > > > The expectation is that the driver shall unregister before removed. > > Yes, with my new driver, if I do a PCI unbind (which unregisters) while > the DMA engine is in use, it panics. The point is the underlying driver > can go away before the channel is removed. and in your driver remove you do not unregister? When unbind is invoked the driver remove is invoked by core and you should unregister whatever you have registered in your probe! Said that, if someone is using the dmaengine at that point of time, it is not a nice thing to do and can cause issues, but on idle it should just work! > I suspect this is less of an issue for most devices as they wouldn't > normally be unbound while in use (for example there's really no reason > to ever unbind IOAT seeing it's built into the system). Though, the fact > is, the user could unbind these devices at anytime and we don't want to > panic if they do. There are many drivers which do modules so yes I am expecting unbind and even a bind following that to work -- ~Vinod ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-12 5:56 ` Vinod Koul @ 2019-11-12 16:45 ` Logan Gunthorpe 2019-11-14 4:55 ` Vinod Koul 0 siblings, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2019-11-12 16:45 UTC (permalink / raw) To: Vinod Koul; +Cc: linux-kernel, dmaengine, Dan Williams On 2019-11-11 10:56 p.m., Vinod Koul wrote: > On 11-11-19, 09:50, Logan Gunthorpe wrote: >> >> >> On 2019-11-09 10:18 a.m., Vinod Koul wrote: >>> Hi Logan, >>> >>> Sorry for delay in reply! >>> >>> On 22-10-19, 15:46, Logan Gunthorpe wrote: >>>> dma_chan_to_owner() dereferences the driver from the struct device to >>>> obtain the owner and call module_[get|put](). However, if the backing >>>> device is unbound before the dma_device is unregistered, the driver >>>> will be cleared and this will cause a NULL pointer dereference. >>> >>> Have you been able to repro this? If so how..? >>> >>> The expectation is that the driver shall unregister before removed. >> >> Yes, with my new driver, if I do a PCI unbind (which unregisters) while >> the DMA engine is in use, it panics. The point is the underlying driver >> can go away before the channel is removed. > > and in your driver remove you do not unregister? When unbind is invoked > the driver remove is invoked by core and you should unregister whatever > you have registered in your probe! > > Said that, if someone is using the dmaengine at that point of time, it > is not a nice thing to do and can cause issues, but on idle it should > just work! But that's the problem. We can't expect our users to be "nice" and not unbind when the driver is in use. Killing the kernel if the user unexpectedly unbinds is not acceptable. >> I suspect this is less of an issue for most devices as they wouldn't >> normally be unbound while in use (for example there's really no reason >> to ever unbind IOAT seeing it's built into the system). Though, the fact >> is, the user could unbind these devices at anytime and we don't want to >> panic if they do. > > There are many drivers which do modules so yes I am expecting unbind and > even a bind following that to work Except they will panic if they unbind while in use, so that's a questionable definition of "work". Logan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-12 16:45 ` Logan Gunthorpe @ 2019-11-14 4:55 ` Vinod Koul 2019-11-14 17:03 ` Logan Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Vinod Koul @ 2019-11-14 4:55 UTC (permalink / raw) To: Logan Gunthorpe; +Cc: linux-kernel, dmaengine, Dan Williams On 12-11-19, 09:45, Logan Gunthorpe wrote: > > > On 2019-11-11 10:56 p.m., Vinod Koul wrote: > > On 11-11-19, 09:50, Logan Gunthorpe wrote: > >> > >> > >> On 2019-11-09 10:18 a.m., Vinod Koul wrote: > >>> Hi Logan, > >>> > >>> Sorry for delay in reply! > >>> > >>> On 22-10-19, 15:46, Logan Gunthorpe wrote: > >>>> dma_chan_to_owner() dereferences the driver from the struct device to > >>>> obtain the owner and call module_[get|put](). However, if the backing > >>>> device is unbound before the dma_device is unregistered, the driver > >>>> will be cleared and this will cause a NULL pointer dereference. > >>> > >>> Have you been able to repro this? If so how..? > >>> > >>> The expectation is that the driver shall unregister before removed. > >> > >> Yes, with my new driver, if I do a PCI unbind (which unregisters) while > >> the DMA engine is in use, it panics. The point is the underlying driver > >> can go away before the channel is removed. > > > > and in your driver remove you do not unregister? When unbind is invoked > > the driver remove is invoked by core and you should unregister whatever > > you have registered in your probe! > > > > Said that, if someone is using the dmaengine at that point of time, it > > is not a nice thing to do and can cause issues, but on idle it should > > just work! > > But that's the problem. We can't expect our users to be "nice" and not > unbind when the driver is in use. Killing the kernel if the user > unexpectedly unbinds is not acceptable. And that is why we review the code and ensure this does not happen and behaviour is as expected > >> I suspect this is less of an issue for most devices as they wouldn't > >> normally be unbound while in use (for example there's really no reason > >> to ever unbind IOAT seeing it's built into the system). Though, the fact > >> is, the user could unbind these devices at anytime and we don't want to > >> panic if they do. > > > > There are many drivers which do modules so yes I am expecting unbind and > > even a bind following that to work > > Except they will panic if they unbind while in use, so that's a > questionable definition of "work". dmaengine core has module reference so while they are being used they won't be removed (unless I complete misread the driver core behaviour) -- ~Vinod ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-14 4:55 ` Vinod Koul @ 2019-11-14 17:03 ` Logan Gunthorpe 2019-11-22 5:20 ` Vinod Koul 0 siblings, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2019-11-14 17:03 UTC (permalink / raw) To: Vinod Koul; +Cc: linux-kernel, dmaengine, Dan Williams On 2019-11-13 9:55 p.m., Vinod Koul wrote: >> But that's the problem. We can't expect our users to be "nice" and not >> unbind when the driver is in use. Killing the kernel if the user >> unexpectedly unbinds is not acceptable. > > And that is why we review the code and ensure this does not happen and > behaviour is as expected Yes, but the current code can kill the kernel when the driver is unbound. >>>> I suspect this is less of an issue for most devices as they wouldn't >>>> normally be unbound while in use (for example there's really no reason >>>> to ever unbind IOAT seeing it's built into the system). Though, the fact >>>> is, the user could unbind these devices at anytime and we don't want to >>>> panic if they do. >>> >>> There are many drivers which do modules so yes I am expecting unbind and >>> even a bind following that to work >> >> Except they will panic if they unbind while in use, so that's a >> questionable definition of "work". > > dmaengine core has module reference so while they are being used they > won't be removed (unless I complete misread the driver core behaviour) Yes, as I mentioned in my other email, holding a module reference does not prevent the driver from being unbound. Any driver can be unbound by the user at any time without the module being removed. Essentially, at any time, a user can do this: echo 0000:83:00.4 > /sys/bus/pci/drivers/plx_dma/unbind Which will call plx_dma_remove() regardless of whether anyone has a reference to the module, and regardless of whether the dma channel is currently in use. I feel it is important that drivers support this without crashing, and my plx_dma driver does the correct thing here. Logan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-14 17:03 ` Logan Gunthorpe @ 2019-11-22 5:20 ` Vinod Koul 2019-11-22 16:53 ` Dave Jiang 0 siblings, 1 reply; 25+ messages in thread From: Vinod Koul @ 2019-11-22 5:20 UTC (permalink / raw) To: Logan Gunthorpe; +Cc: linux-kernel, dmaengine, Dan Williams On 14-11-19, 10:03, Logan Gunthorpe wrote: > > > On 2019-11-13 9:55 p.m., Vinod Koul wrote: > >> But that's the problem. We can't expect our users to be "nice" and not > >> unbind when the driver is in use. Killing the kernel if the user > >> unexpectedly unbinds is not acceptable. > > > > And that is why we review the code and ensure this does not happen and > > behaviour is as expected > > Yes, but the current code can kill the kernel when the driver is unbound. > > >>>> I suspect this is less of an issue for most devices as they wouldn't > >>>> normally be unbound while in use (for example there's really no reason > >>>> to ever unbind IOAT seeing it's built into the system). Though, the fact > >>>> is, the user could unbind these devices at anytime and we don't want to > >>>> panic if they do. > >>> > >>> There are many drivers which do modules so yes I am expecting unbind and > >>> even a bind following that to work > >> > >> Except they will panic if they unbind while in use, so that's a > >> questionable definition of "work". > > > > dmaengine core has module reference so while they are being used they > > won't be removed (unless I complete misread the driver core behaviour) > > Yes, as I mentioned in my other email, holding a module reference does > not prevent the driver from being unbound. Any driver can be unbound by > the user at any time without the module being removed. That sounds okay then. > > Essentially, at any time, a user can do this: > > echo 0000:83:00.4 > /sys/bus/pci/drivers/plx_dma/unbind > > Which will call plx_dma_remove() regardless of whether anyone has a > reference to the module, and regardless of whether the dma channel is > currently in use. I feel it is important that drivers support this > without crashing, and my plx_dma driver does the correct thing here. > > Logan -- ~Vinod ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-22 5:20 ` Vinod Koul @ 2019-11-22 16:53 ` Dave Jiang 2019-11-22 20:50 ` Dan Williams 0 siblings, 1 reply; 25+ messages in thread From: Dave Jiang @ 2019-11-22 16:53 UTC (permalink / raw) To: Vinod Koul, Logan Gunthorpe; +Cc: linux-kernel, dmaengine, Dan Williams On 11/21/19 10:20 PM, Vinod Koul wrote: > On 14-11-19, 10:03, Logan Gunthorpe wrote: >> >> >> On 2019-11-13 9:55 p.m., Vinod Koul wrote: >>>> But that's the problem. We can't expect our users to be "nice" and not >>>> unbind when the driver is in use. Killing the kernel if the user >>>> unexpectedly unbinds is not acceptable. >>> >>> And that is why we review the code and ensure this does not happen and >>> behaviour is as expected >> >> Yes, but the current code can kill the kernel when the driver is unbound. >> >>>>>> I suspect this is less of an issue for most devices as they wouldn't >>>>>> normally be unbound while in use (for example there's really no reason >>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact >>>>>> is, the user could unbind these devices at anytime and we don't want to >>>>>> panic if they do. >>>>> >>>>> There are many drivers which do modules so yes I am expecting unbind and >>>>> even a bind following that to work >>>> >>>> Except they will panic if they unbind while in use, so that's a >>>> questionable definition of "work". >>> >>> dmaengine core has module reference so while they are being used they >>> won't be removed (unless I complete misread the driver core behaviour) >> >> Yes, as I mentioned in my other email, holding a module reference does >> not prevent the driver from being unbound. Any driver can be unbound by >> the user at any time without the module being removed. > > That sounds okay then. I'm actually glad Logan is putting some work in addressing this. I also ran into the same issue as well dealing with unbinds on my new driver. >> >> Essentially, at any time, a user can do this: >> >> echo 0000:83:00.4 > /sys/bus/pci/drivers/plx_dma/unbind >> >> Which will call plx_dma_remove() regardless of whether anyone has a >> reference to the module, and regardless of whether the dma channel is >> currently in use. I feel it is important that drivers support this >> without crashing, and my plx_dma driver does the correct thing here. >> >> Logan > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-22 16:53 ` Dave Jiang @ 2019-11-22 20:50 ` Dan Williams 2019-11-22 20:56 ` Logan Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Dan Williams @ 2019-11-22 20:50 UTC (permalink / raw) To: Dave Jiang Cc: Vinod Koul, Logan Gunthorpe, Linux Kernel Mailing List, dmaengine On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <dave.jiang@intel.com> wrote: > > > > On 11/21/19 10:20 PM, Vinod Koul wrote: > > On 14-11-19, 10:03, Logan Gunthorpe wrote: > >> > >> > >> On 2019-11-13 9:55 p.m., Vinod Koul wrote: > >>>> But that's the problem. We can't expect our users to be "nice" and not > >>>> unbind when the driver is in use. Killing the kernel if the user > >>>> unexpectedly unbinds is not acceptable. > >>> > >>> And that is why we review the code and ensure this does not happen and > >>> behaviour is as expected > >> > >> Yes, but the current code can kill the kernel when the driver is unbound. > >> > >>>>>> I suspect this is less of an issue for most devices as they wouldn't > >>>>>> normally be unbound while in use (for example there's really no reason > >>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact > >>>>>> is, the user could unbind these devices at anytime and we don't want to > >>>>>> panic if they do. > >>>>> > >>>>> There are many drivers which do modules so yes I am expecting unbind and > >>>>> even a bind following that to work > >>>> > >>>> Except they will panic if they unbind while in use, so that's a > >>>> questionable definition of "work". > >>> > >>> dmaengine core has module reference so while they are being used they > >>> won't be removed (unless I complete misread the driver core behaviour) > >> > >> Yes, as I mentioned in my other email, holding a module reference does > >> not prevent the driver from being unbound. Any driver can be unbound by > >> the user at any time without the module being removed. > > > > That sounds okay then. > > I'm actually glad Logan is putting some work in addressing this. I also > ran into the same issue as well dealing with unbinds on my new driver. This was an original mistake of the dmaengine implementation that Vinod inherited. Module pinning is distinct from preventing device unbind which ultimately can't be prevented. Longer term I think we need to audit dmaengine consumers to make sure they are prepared for the driver to be removed similar to how other request based drivers can gracefully return an error status when the device goes away rather than crashing. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-22 20:50 ` Dan Williams @ 2019-11-22 20:56 ` Logan Gunthorpe 2019-11-22 21:01 ` Dan Williams 0 siblings, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2019-11-22 20:56 UTC (permalink / raw) To: Dan Williams, Dave Jiang; +Cc: Vinod Koul, Linux Kernel Mailing List, dmaengine On 2019-11-22 1:50 p.m., Dan Williams wrote: > On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <dave.jiang@intel.com> wrote: >> >> >> >> On 11/21/19 10:20 PM, Vinod Koul wrote: >>> On 14-11-19, 10:03, Logan Gunthorpe wrote: >>>> >>>> >>>> On 2019-11-13 9:55 p.m., Vinod Koul wrote: >>>>>> But that's the problem. We can't expect our users to be "nice" and not >>>>>> unbind when the driver is in use. Killing the kernel if the user >>>>>> unexpectedly unbinds is not acceptable. >>>>> >>>>> And that is why we review the code and ensure this does not happen and >>>>> behaviour is as expected >>>> >>>> Yes, but the current code can kill the kernel when the driver is unbound. >>>> >>>>>>>> I suspect this is less of an issue for most devices as they wouldn't >>>>>>>> normally be unbound while in use (for example there's really no reason >>>>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact >>>>>>>> is, the user could unbind these devices at anytime and we don't want to >>>>>>>> panic if they do. >>>>>>> >>>>>>> There are many drivers which do modules so yes I am expecting unbind and >>>>>>> even a bind following that to work >>>>>> >>>>>> Except they will panic if they unbind while in use, so that's a >>>>>> questionable definition of "work". >>>>> >>>>> dmaengine core has module reference so while they are being used they >>>>> won't be removed (unless I complete misread the driver core behaviour) >>>> >>>> Yes, as I mentioned in my other email, holding a module reference does >>>> not prevent the driver from being unbound. Any driver can be unbound by >>>> the user at any time without the module being removed. >>> >>> That sounds okay then. >> >> I'm actually glad Logan is putting some work in addressing this. I also >> ran into the same issue as well dealing with unbinds on my new driver. > > This was an original mistake of the dmaengine implementation that > Vinod inherited. Module pinning is distinct from preventing device > unbind which ultimately can't be prevented. Longer term I think we > need to audit dmaengine consumers to make sure they are prepared for > the driver to be removed similar to how other request based drivers > can gracefully return an error status when the device goes away rather > than crashing. Yes, but that will be a big project because there are a lot of drivers. But I think the dmaengine common code needs to support removal properly, which essentially means changing how all the drivers allocate and free their structures, among other things. The one saving grace is that most of the drivers are for SOCs which can't be physically removed and there's really no use-case for the user to call unbind. Logan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-22 20:56 ` Logan Gunthorpe @ 2019-11-22 21:01 ` Dan Williams 2019-11-22 21:42 ` Dave Jiang 0 siblings, 1 reply; 25+ messages in thread From: Dan Williams @ 2019-11-22 21:01 UTC (permalink / raw) To: Logan Gunthorpe Cc: Dave Jiang, Vinod Koul, Linux Kernel Mailing List, dmaengine On Fri, Nov 22, 2019 at 12:56 PM Logan Gunthorpe <logang@deltatee.com> wrote: > > > > On 2019-11-22 1:50 p.m., Dan Williams wrote: > > On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <dave.jiang@intel.com> wrote: > >> > >> > >> > >> On 11/21/19 10:20 PM, Vinod Koul wrote: > >>> On 14-11-19, 10:03, Logan Gunthorpe wrote: > >>>> > >>>> > >>>> On 2019-11-13 9:55 p.m., Vinod Koul wrote: > >>>>>> But that's the problem. We can't expect our users to be "nice" and not > >>>>>> unbind when the driver is in use. Killing the kernel if the user > >>>>>> unexpectedly unbinds is not acceptable. > >>>>> > >>>>> And that is why we review the code and ensure this does not happen and > >>>>> behaviour is as expected > >>>> > >>>> Yes, but the current code can kill the kernel when the driver is unbound. > >>>> > >>>>>>>> I suspect this is less of an issue for most devices as they wouldn't > >>>>>>>> normally be unbound while in use (for example there's really no reason > >>>>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact > >>>>>>>> is, the user could unbind these devices at anytime and we don't want to > >>>>>>>> panic if they do. > >>>>>>> > >>>>>>> There are many drivers which do modules so yes I am expecting unbind and > >>>>>>> even a bind following that to work > >>>>>> > >>>>>> Except they will panic if they unbind while in use, so that's a > >>>>>> questionable definition of "work". > >>>>> > >>>>> dmaengine core has module reference so while they are being used they > >>>>> won't be removed (unless I complete misread the driver core behaviour) > >>>> > >>>> Yes, as I mentioned in my other email, holding a module reference does > >>>> not prevent the driver from being unbound. Any driver can be unbound by > >>>> the user at any time without the module being removed. > >>> > >>> That sounds okay then. > >> > >> I'm actually glad Logan is putting some work in addressing this. I also > >> ran into the same issue as well dealing with unbinds on my new driver. > > > > This was an original mistake of the dmaengine implementation that > > Vinod inherited. Module pinning is distinct from preventing device > > unbind which ultimately can't be prevented. Longer term I think we > > need to audit dmaengine consumers to make sure they are prepared for > > the driver to be removed similar to how other request based drivers > > can gracefully return an error status when the device goes away rather > > than crashing. > > Yes, but that will be a big project because there are a lot of drivers. Oh yes, in fact I think it's something that can only reasonably be considered for new consumers. > But I think the dmaengine common code needs to support removal properly, > which essentially means changing how all the drivers allocate and free > their structures, among other things. > > The one saving grace is that most of the drivers are for SOCs which > can't be physically removed and there's really no use-case for the user > to call unbind. Yes, the SOC case is not so much my concern as the generic offload use cases, especially if those offloads are in a similar hotplug domain as a cpu. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-22 21:01 ` Dan Williams @ 2019-11-22 21:42 ` Dave Jiang 2019-12-10 9:53 ` Vinod Koul 0 siblings, 1 reply; 25+ messages in thread From: Dave Jiang @ 2019-11-22 21:42 UTC (permalink / raw) To: Dan Williams, Logan Gunthorpe Cc: Vinod Koul, Linux Kernel Mailing List, dmaengine On 11/22/19 2:01 PM, Dan Williams wrote: > On Fri, Nov 22, 2019 at 12:56 PM Logan Gunthorpe <logang@deltatee.com> wrote: >> >> >> >> On 2019-11-22 1:50 p.m., Dan Williams wrote: >>> On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <dave.jiang@intel.com> wrote: >>>> >>>> >>>> >>>> On 11/21/19 10:20 PM, Vinod Koul wrote: >>>>> On 14-11-19, 10:03, Logan Gunthorpe wrote: >>>>>> >>>>>> >>>>>> On 2019-11-13 9:55 p.m., Vinod Koul wrote: >>>>>>>> But that's the problem. We can't expect our users to be "nice" and not >>>>>>>> unbind when the driver is in use. Killing the kernel if the user >>>>>>>> unexpectedly unbinds is not acceptable. >>>>>>> >>>>>>> And that is why we review the code and ensure this does not happen and >>>>>>> behaviour is as expected >>>>>> >>>>>> Yes, but the current code can kill the kernel when the driver is unbound. >>>>>> >>>>>>>>>> I suspect this is less of an issue for most devices as they wouldn't >>>>>>>>>> normally be unbound while in use (for example there's really no reason >>>>>>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact >>>>>>>>>> is, the user could unbind these devices at anytime and we don't want to >>>>>>>>>> panic if they do. >>>>>>>>> >>>>>>>>> There are many drivers which do modules so yes I am expecting unbind and >>>>>>>>> even a bind following that to work >>>>>>>> >>>>>>>> Except they will panic if they unbind while in use, so that's a >>>>>>>> questionable definition of "work". >>>>>>> >>>>>>> dmaengine core has module reference so while they are being used they >>>>>>> won't be removed (unless I complete misread the driver core behaviour) >>>>>> >>>>>> Yes, as I mentioned in my other email, holding a module reference does >>>>>> not prevent the driver from being unbound. Any driver can be unbound by >>>>>> the user at any time without the module being removed. >>>>> >>>>> That sounds okay then. >>>> >>>> I'm actually glad Logan is putting some work in addressing this. I also >>>> ran into the same issue as well dealing with unbinds on my new driver. >>> >>> This was an original mistake of the dmaengine implementation that >>> Vinod inherited. Module pinning is distinct from preventing device >>> unbind which ultimately can't be prevented. Longer term I think we >>> need to audit dmaengine consumers to make sure they are prepared for >>> the driver to be removed similar to how other request based drivers >>> can gracefully return an error status when the device goes away rather >>> than crashing. >> >> Yes, but that will be a big project because there are a lot of drivers. > > Oh yes, in fact I think it's something that can only reasonably be > considered for new consumers. > >> But I think the dmaengine common code needs to support removal properly, >> which essentially means changing how all the drivers allocate and free >> their structures, among other things. >> >> The one saving grace is that most of the drivers are for SOCs which >> can't be physically removed and there's really no use-case for the user >> to call unbind. > > Yes, the SOC case is not so much my concern as the generic offload use > cases, especially if those offloads are in a similar hotplug domain as > a cpu. > It becomes a bigger issue when "channels" can be reconfigured and can come and go in a hot plug fashion. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-11-22 21:42 ` Dave Jiang @ 2019-12-10 9:53 ` Vinod Koul 2019-12-10 17:39 ` Logan Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Vinod Koul @ 2019-12-10 9:53 UTC (permalink / raw) To: Dave Jiang, Dan Williams, Logan Gunthorpe Cc: Linux Kernel Mailing List, dmaengine On 22-11-19, 14:42, Dave Jiang wrote: > > > On 11/22/19 2:01 PM, Dan Williams wrote: > > On Fri, Nov 22, 2019 at 12:56 PM Logan Gunthorpe <logang@deltatee.com> wrote: > > > > > > > > > > > > On 2019-11-22 1:50 p.m., Dan Williams wrote: > > > > On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <dave.jiang@intel.com> wrote: > > > > > > > > > > > > > > > > > > > > On 11/21/19 10:20 PM, Vinod Koul wrote: > > > > > > On 14-11-19, 10:03, Logan Gunthorpe wrote: > > > > > > > > > > > > > > > > > > > > > On 2019-11-13 9:55 p.m., Vinod Koul wrote: > > > > > > > > > But that's the problem. We can't expect our users to be "nice" and not > > > > > > > > > unbind when the driver is in use. Killing the kernel if the user > > > > > > > > > unexpectedly unbinds is not acceptable. > > > > > > > > > > > > > > > > And that is why we review the code and ensure this does not happen and > > > > > > > > behaviour is as expected > > > > > > > > > > > > > > Yes, but the current code can kill the kernel when the driver is unbound. > > > > > > > > > > > > > > > > > > I suspect this is less of an issue for most devices as they wouldn't > > > > > > > > > > > normally be unbound while in use (for example there's really no reason > > > > > > > > > > > to ever unbind IOAT seeing it's built into the system). Though, the fact > > > > > > > > > > > is, the user could unbind these devices at anytime and we don't want to > > > > > > > > > > > panic if they do. > > > > > > > > > > > > > > > > > > > > There are many drivers which do modules so yes I am expecting unbind and > > > > > > > > > > even a bind following that to work > > > > > > > > > > > > > > > > > > Except they will panic if they unbind while in use, so that's a > > > > > > > > > questionable definition of "work". > > > > > > > > > > > > > > > > dmaengine core has module reference so while they are being used they > > > > > > > > won't be removed (unless I complete misread the driver core behaviour) > > > > > > > > > > > > > > Yes, as I mentioned in my other email, holding a module reference does > > > > > > > not prevent the driver from being unbound. Any driver can be unbound by > > > > > > > the user at any time without the module being removed. > > > > > > > > > > > > That sounds okay then. > > > > > > > > > > I'm actually glad Logan is putting some work in addressing this. I also > > > > > ran into the same issue as well dealing with unbinds on my new driver. > > > > > > > > This was an original mistake of the dmaengine implementation that > > > > Vinod inherited. Module pinning is distinct from preventing device > > > > unbind which ultimately can't be prevented. Longer term I think we > > > > need to audit dmaengine consumers to make sure they are prepared for > > > > the driver to be removed similar to how other request based drivers > > > > can gracefully return an error status when the device goes away rather > > > > than crashing. Right finally wrapping my head of static dmaengine devices! we can indeed have devices going away, but me wondering why this should be handled in subsystems! Should the driver core not be doing this instead? Would it be not a safe behaviour for unplug to unload the driver and thus give a chance to unbind from subsystems too... > > > Yes, but that will be a big project because there are a lot of drivers. > > > > Oh yes, in fact I think it's something that can only reasonably be > > considered for new consumers. > > > > > But I think the dmaengine common code needs to support removal properly, > > > which essentially means changing how all the drivers allocate and free > > > their structures, among other things. > > > > > > The one saving grace is that most of the drivers are for SOCs which > > > can't be physically removed and there's really no use-case for the user > > > to call unbind. yeah only a small set of drivers would need this for now! > > Yes, the SOC case is not so much my concern as the generic offload use > > cases, especially if those offloads are in a similar hotplug domain as > > a cpu. > > It becomes a bigger issue when "channels" can be reconfigured and can come > and go in a hot plug fashion. -- ~Vinod ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct 2019-12-10 9:53 ` Vinod Koul @ 2019-12-10 17:39 ` Logan Gunthorpe 0 siblings, 0 replies; 25+ messages in thread From: Logan Gunthorpe @ 2019-12-10 17:39 UTC (permalink / raw) To: Vinod Koul, Dave Jiang, Dan Williams; +Cc: Linux Kernel Mailing List, dmaengine On 2019-12-10 2:53 a.m., Vinod Koul wrote: > On 22-11-19, 14:42, Dave Jiang wrote: >> >> >> On 11/22/19 2:01 PM, Dan Williams wrote: >>> On Fri, Nov 22, 2019 at 12:56 PM Logan Gunthorpe <logang@deltatee.com> wrote: >>>> >>>> >>>> >>>> On 2019-11-22 1:50 p.m., Dan Williams wrote: >>>>> On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <dave.jiang@intel.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 11/21/19 10:20 PM, Vinod Koul wrote: >>>>>>> On 14-11-19, 10:03, Logan Gunthorpe wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 2019-11-13 9:55 p.m., Vinod Koul wrote: >>>>>>>>>> But that's the problem. We can't expect our users to be "nice" and not >>>>>>>>>> unbind when the driver is in use. Killing the kernel if the user >>>>>>>>>> unexpectedly unbinds is not acceptable. >>>>>>>>> >>>>>>>>> And that is why we review the code and ensure this does not happen and >>>>>>>>> behaviour is as expected >>>>>>>> >>>>>>>> Yes, but the current code can kill the kernel when the driver is unbound. >>>>>>>> >>>>>>>>>>>> I suspect this is less of an issue for most devices as they wouldn't >>>>>>>>>>>> normally be unbound while in use (for example there's really no reason >>>>>>>>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact >>>>>>>>>>>> is, the user could unbind these devices at anytime and we don't want to >>>>>>>>>>>> panic if they do. >>>>>>>>>>> >>>>>>>>>>> There are many drivers which do modules so yes I am expecting unbind and >>>>>>>>>>> even a bind following that to work >>>>>>>>>> >>>>>>>>>> Except they will panic if they unbind while in use, so that's a >>>>>>>>>> questionable definition of "work". >>>>>>>>> >>>>>>>>> dmaengine core has module reference so while they are being used they >>>>>>>>> won't be removed (unless I complete misread the driver core behaviour) >>>>>>>> >>>>>>>> Yes, as I mentioned in my other email, holding a module reference does >>>>>>>> not prevent the driver from being unbound. Any driver can be unbound by >>>>>>>> the user at any time without the module being removed. >>>>>>> >>>>>>> That sounds okay then. >>>>>> >>>>>> I'm actually glad Logan is putting some work in addressing this. I also >>>>>> ran into the same issue as well dealing with unbinds on my new driver. >>>>> >>>>> This was an original mistake of the dmaengine implementation that >>>>> Vinod inherited. Module pinning is distinct from preventing device >>>>> unbind which ultimately can't be prevented. Longer term I think we >>>>> need to audit dmaengine consumers to make sure they are prepared for >>>>> the driver to be removed similar to how other request based drivers >>>>> can gracefully return an error status when the device goes away rather >>>>> than crashing. > > Right finally wrapping my head of static dmaengine devices! we can > indeed have devices going away, but me wondering why this should be > handled in subsystems! Should the driver core not be doing this instead? > Would it be not a safe behaviour for unplug to unload the driver and > thus give a chance to unbind from subsystems too... Yes, I think it should be in the core. I was just worried about breaking older drivers. But I'll see if I can move a bit more of the logic for a v3 series. Thanks, Logan ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2019-12-25 2:25 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-16 19:01 [PATCH 0/5] Support hot-unbind in IOAT Logan Gunthorpe 2019-12-16 19:01 ` [PATCH 1/5] dmaengine: Store module owner in dma_device struct Logan Gunthorpe 2019-12-16 19:01 ` [PATCH 2/5] dmaengine: Call module_put() after device_free_chan_resources() Logan Gunthorpe 2019-12-24 4:37 ` Vinod Koul 2019-12-16 19:01 ` [PATCH 3/5] dmaengine: Move dma_channel_rebalance() infrastructure up in code Logan Gunthorpe 2019-12-16 19:01 ` [PATCH 4/5] dmaengine: Add reference counting to dma_device struct Logan Gunthorpe 2019-12-16 19:01 ` [PATCH 5/5] dmaengine: ioat: Support in-use unbind Logan Gunthorpe 2019-12-17 16:33 ` Dave Jiang 2019-12-24 4:50 ` [PATCH 0/5] Support hot-unbind in IOAT Vinod Koul 2019-12-25 2:25 ` Logan Gunthorpe -- strict thread matches above, loose matches on Subject: below -- 2019-10-22 21:46 [PATCH 0/5] PLX Switch DMA Engine Driver Logan Gunthorpe 2019-10-22 21:46 ` [PATCH 1/5] dmaengine: Store module owner in dma_device struct Logan Gunthorpe 2019-11-09 17:18 ` Vinod Koul 2019-11-11 16:50 ` Logan Gunthorpe 2019-11-12 5:56 ` Vinod Koul 2019-11-12 16:45 ` Logan Gunthorpe 2019-11-14 4:55 ` Vinod Koul 2019-11-14 17:03 ` Logan Gunthorpe 2019-11-22 5:20 ` Vinod Koul 2019-11-22 16:53 ` Dave Jiang 2019-11-22 20:50 ` Dan Williams 2019-11-22 20:56 ` Logan Gunthorpe 2019-11-22 21:01 ` Dan Williams 2019-11-22 21:42 ` Dave Jiang 2019-12-10 9:53 ` Vinod Koul 2019-12-10 17:39 ` Logan Gunthorpe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).