From: Andrew Morton <akpm@osdl.org>
To: chris.leech@gmail.com
Cc: christopher.leech@intel.com, jeff@garzik.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
Date: Sun, 5 Mar 2006 00:09:33 -0800 [thread overview]
Message-ID: <20060305000933.2d799138.akpm@osdl.org> (raw)
In-Reply-To: <41b516cb0603031439n13e4df4cg8e5b21b606d2b4b8@mail.gmail.com>
"Chris Leech" <christopher.leech@intel.com> wrote:
>
> > Patch #2 didn't make it. Too big for the list?
>
> Could be, it's the largest of the series. I've attached the gziped
> patch. I can try and split this up for the future.
>
> ..
>
> [I/OAT] Driver for the Intel(R) I/OAT DMA engine
> Adds a new ioatdma driver
>
> ...
> +struct cb_pci_pmcap_register {
> + uint32_t capid:8; /* RO: 01h */
> + uint32_t nxtcapptr:8;
> + uint32_t version:3; /* RO: 010b */
> + uint32_t pmeclk:1; /* RO: 0b */
> + uint32_t reserved:1; /* RV: 0b */
> + uint32_t dsi:1; /* RO: 0b */
> + uint32_t aux_current:3; /* RO: 000b */
> + uint32_t d1_support:1; /* RO: 0b */
> + uint32_t d2_support:1; /* RO: 0b */
> + uint32_t pme_support:5; /* RO: 11001b */
> +};
This maps onto hardware registers? No big-endian plans in Intel's future? ;)
I have a vague feeling that gcc changed its layout of bitfields many years
ago. I guess we're fairly safe against that. Presumably gcc and icc use the
same layout?
Still. It's a bit of a concern, but I guess we can worry about that if it
happens.
> +
> +static inline u8 read_reg8(struct cb_device *device, unsigned int offset)
> +{
> + return readb(device->reg_base + offset);
> +}
These are fairly generic-sounding names. In fact the as-yet-unmerged tiacx
wireless driver is already using these, privately to
drivers/net/wireless/tiacx/pci.c.
> +static int enumerate_dma_channels(struct cb_device *device)
> +{
> + u8 xfercap_scale;
> + u32 xfercap;
> + int i;
> + struct cb_dma_chan *cb_chan;
> +
> + device->common.chancnt = read_reg8(device, CB_CHANCNT_OFFSET);
> + xfercap_scale = read_reg8(device, CB_XFERCAP_OFFSET);
> + xfercap = (xfercap_scale == 0 ? ~0UL : (1 << xfercap_scale));
I recommend using just "-1" to represent the all-ones pattern. It simply
works, in all situations.
Where you _did_ want the UL was after that "1".
> + for (i = 0; i < device->common.chancnt; i++) {
> + cb_chan = kzalloc(sizeof(*cb_chan), GFP_KERNEL);
> + if (!cb_chan)
> + return -ENOMEM;
memory leak?
> + cb_chan->device = device;
> + cb_chan->reg_base = device->reg_base + (0x80 * (i + 1));
> + cb_chan->xfercap = xfercap;
> + spin_lock_init(&cb_chan->cleanup_lock);
> + spin_lock_init(&cb_chan->desc_lock);
> + INIT_LIST_HEAD(&cb_chan->free_desc);
> + INIT_LIST_HEAD(&cb_chan->used_desc);
> + /* This should be made common somewhere in dmaengine.c */
> + cb_chan->common.device = &device->common;
> + cb_chan->common.client = NULL;
> + list_add_tail(&cb_chan->common.device_node, &device->common.channels);
No locking needed for that list?
> +static struct cb_desc_sw * cb_dma_alloc_descriptor(struct cb_dma_chan *cb_chan)
There's a mix of styles here. I don't think the space after the asterisk does
anything useful, and it could be argued that it's incorrect (or misleading)
wrt C declaration semantics.
> +{
> + struct cb_dma_descriptor *desc;
What do all these "cb"'s stand for, anyway?
> + struct cb_desc_sw *desc_sw;
> + struct cb_device *cb_device = to_cb_device(cb_chan->common.device);
> + dma_addr_t phys;
> +
> + desc = pci_pool_alloc(cb_device->dma_pool, GFP_ATOMIC, &phys);
> + if (!desc)
> + return NULL;
> +
> + desc_sw = kzalloc(sizeof(*desc_sw), GFP_ATOMIC);
GFP_ATOMIC is to be avoided if at all possible. It stresses the memory system
and can easily fail under load.
>From my reading, two of the callers could trivially call this function outside
spin_lock_bh() and the third could perhaps do so with a little work. You
could at least fix up two of those callers, and pass in the gfp_flags.
<wonders why the heck dma_pool_alloc() uses SLAB_ATOMIC when the caller's
passing in the gfp_flags>
> +/* returns the actual number of allocated descriptors */
> +static int cb_dma_alloc_chan_resources(struct dma_chan *chan)
> +{
> ...
> + /* Allocate descriptors */
> + spin_lock_bh(&cb_chan->desc_lock);
> + for (i = 0; i < INITIAL_CB_DESC_COUNT; i++) {
> + desc = cb_dma_alloc_descriptor(cb_chan);
> + if (!desc) {
> + printk(KERN_ERR "CB: Only %d initial descriptors\n", i);
> + break;
> + }
> + list_add_tail(&desc->node, &cb_chan->free_desc);
> + }
> + spin_unlock_bh(&cb_chan->desc_lock);
Here's one such caller.
> +
> +static void cb_dma_free_chan_resources(struct dma_chan *chan)
> +{
> + struct cb_dma_chan *cb_chan = to_cb_chan(chan);
> + struct cb_device *cb_device = to_cb_device(chan->device);
> + struct cb_desc_sw *desc, *_desc;
> + u16 chanctrl;
> + int in_use_descs = 0;
> +
> + cb_dma_memcpy_cleanup(cb_chan);
> +
> + chan_write_reg8(cb_chan, CB_CHANCMD_OFFSET, CB_CHANCMD_RESET);
> +
> + spin_lock_bh(&cb_chan->desc_lock);
> + list_for_each_entry_safe(desc, _desc, &cb_chan->used_desc, node) {
> + in_use_descs++;
> + list_del(&desc->node);
> + pci_pool_free(cb_device->dma_pool, desc->hw, desc->phys);
> + kfree(desc);
> + }
> + list_for_each_entry_safe(desc, _desc, &cb_chan->free_desc, node) {
> + list_del(&desc->node);
> + pci_pool_free(cb_device->dma_pool, desc->hw, desc->phys);
> + kfree(desc);
> + }
> + spin_unlock_bh(&cb_chan->desc_lock);
Do we actually need the lock there? If we're freeing everything which it
protects anwyay?
> +
> +static void cb_dma_memcpy_cleanup(struct cb_dma_chan *chan)
> +{
> + unsigned long phys_complete;
> + struct cb_desc_sw *desc, *_desc;
> + dma_cookie_t cookie = 0;
> +
> + prefetch(chan->completion_virt);
> +
> + if (!spin_trylock(&chan->cleanup_lock))
> + return;
What's going on here? Lock ranking problems? spin_trylock() in
non-infrastructural code is a bit of a red flag.
Whatever the reason, it needs a comment in there please. That comment should
also explain why simply baling out is acceptable.
> +
> +static irqreturn_t cb_do_interrupt(int irq, void *data, struct pt_regs *regs)
> +{
> + struct cb_device *instance = data;
> + unsigned long attnstatus;
> + u8 intrctrl;
> +
> + intrctrl = read_reg8(instance, CB_INTRCTRL_OFFSET);
> +
> + if (!(intrctrl & CB_INTRCTRL_MASTER_INT_EN)) {
> + return IRQ_NONE;
> + }
braces.
> + attnstatus = (unsigned long) read_reg32(instance, CB_ATTNSTATUS_OFFSET);
Unneeded cast.
> +static void cb_start_null_desc(struct cb_dma_chan *cb_chan)
> +{
> + struct cb_desc_sw *desc;
> +
> + spin_lock_bh(&cb_chan->desc_lock);
> +
> + if (!list_empty(&cb_chan->free_desc)) {
> + desc = to_cb_desc(cb_chan->free_desc.next);
> + list_del(&desc->node);
> + } else {
> + /* try to get another desc */
> + desc = cb_dma_alloc_descriptor(cb_chan);
> + /* will this ever happen? */
> + BUG_ON(!desc);
> + }
> +
> + desc->hw->ctl = CB_DMA_DESCRIPTOR_NUL;
> + desc->hw->next = 0;
> +
> + list_add_tail(&desc->node, &cb_chan->used_desc);
> +
> +#if (BITS_PER_LONG == 64)
> + chan_write_reg64(cb_chan, CB_CHAINADDR_OFFSET, desc->phys);
> +#else
> + chan_write_reg32(cb_chan, CB_CHAINADDR_OFFSET_LOW, (u32) desc->phys);
> + chan_write_reg32(cb_chan, CB_CHAINADDR_OFFSET_HIGH, 0);
> +#endif
> + chan_write_reg8(cb_chan, CB_CHANCMD_OFFSET, CB_CHANCMD_START);
> +
> + spin_unlock_bh(&cb_chan->desc_lock);
> +}
Can the chan_write*() calls be moved outside the locked region?
> +/*
> + * Perform a CB transaction to verify the HW works.
> + */
Damn, I wish I knew what CB meant.
> +#define CB_TEST_SIZE 2000
> +
> +static int cb_self_test(struct cb_device *device)
> +{
> + int i;
> + u8 *src;
> + u8 *dest;
> + struct dma_chan *dma_chan;
> + dma_cookie_t cookie;
> + int err = 0;
> +
> + src = kzalloc(sizeof(u8) * CB_TEST_SIZE, SLAB_KERNEL);
> + if (!src)
> + return -ENOMEM;
> + dest = kzalloc(sizeof(u8) * CB_TEST_SIZE, SLAB_KERNEL);
> + if (!dest) {
> + kfree(src);
> + return -ENOMEM;
> + }
> +
> + /* Fill in src buffer */
> + for (i = 0; i < CB_TEST_SIZE; i++)
> + src[i] = (u8)i;
memset?
> + /* Start copy, using first DMA channel */
> + dma_chan = container_of(device->common.channels.next, struct dma_chan, device_node);
> +
> + cb_dma_alloc_chan_resources(dma_chan);
cb_dma_alloc_chan_resources() can fail.
> + cookie = cb_dma_memcpy_buf_to_buf(dma_chan, dest, src, CB_TEST_SIZE);
> + cb_dma_memcpy_issue_pending(dma_chan);
> +
> + udelay(1000);
msleep(1) would be preferred.
> +static int __devinit cb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> +{
> + int err;
> + unsigned long mmio_start, mmio_len;
> + void *reg_base;
> + struct cb_device *device;
> +
> + err = pci_enable_device(pdev);
> + if (err)
> + goto err_enable_device;
> +
> + err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
> + if (err)
> + err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
> + if (err)
> + goto err_set_dma_mask;
> +
> + err = pci_request_regions(pdev, cb_pci_drv.name);
> + if (err)
> + goto err_request_regions;
> +
> + mmio_start = pci_resource_start(pdev, 0);
> + mmio_len = pci_resource_len(pdev, 0);
> +
> + reg_base = ioremap(mmio_start, mmio_len);
> + if (!reg_base) {
> + err = -ENOMEM;
> + goto err_ioremap;
> + }
> +
> + device = kzalloc(sizeof(*device), GFP_KERNEL);
> + if (!device) {
> + err = -ENOMEM;
> + goto err_kzalloc;
> + }
> +
> + /* DMA coherent memory pool for DMA descriptor allocations */
> + device->dma_pool = pci_pool_create("dma_desc_pool", pdev,
> + sizeof(struct cb_dma_descriptor), 64, 0);
> + if (!device->dma_pool) {
> + err = -ENOMEM;
> + goto err_dma_pool;
> + }
> +
> + device->completion_pool = pci_pool_create("completion_pool", pdev, sizeof(u64), SMP_CACHE_BYTES, SMP_CACHE_BYTES);
> + if (!device->completion_pool) {
> + err = -ENOMEM;
> + goto err_completion_pool;
> + }
> +
> + device->pdev = pdev;
> + pci_set_drvdata(pdev, device);
> +#ifdef CONFIG_PCI_MSI
> + if (pci_enable_msi(pdev) == 0) {
> + device->msi = 1;
> + } else {
> + device->msi = 0;
> + }
> +#endif
> + err = request_irq(pdev->irq, &cb_do_interrupt, SA_SHIRQ, "ioat",
> + device);
> + if (err)
> + goto err_irq;
> +
> + device->reg_base = reg_base;
> +
> + write_reg8(device, CB_INTRCTRL_OFFSET, CB_INTRCTRL_MASTER_INT_EN);
> + pci_set_master(pdev);
> +
> + INIT_LIST_HEAD(&device->common.channels);
> + enumerate_dma_channels(device);
enumerate_dma_channels() can fail.
> + device->common.device_alloc_chan_resources = cb_dma_alloc_chan_resources;
> + device->common.device_free_chan_resources = cb_dma_free_chan_resources;
> + device->common.device_memcpy_buf_to_buf = cb_dma_memcpy_buf_to_buf;
> + device->common.device_memcpy_buf_to_pg = cb_dma_memcpy_buf_to_pg;
> + device->common.device_memcpy_pg_to_pg = cb_dma_memcpy_pg_to_pg;
> + device->common.device_memcpy_complete = cb_dma_is_complete;
> + device->common.device_memcpy_issue_pending = cb_dma_memcpy_issue_pending;
> + printk(KERN_INFO "Intel(R) I/OAT DMA Engine found, %d channels\n",
> + device->common.chancnt);
> +
> + if ((err = cb_self_test(device)))
> + goto err_self_test;
> +
> + dma_async_device_register(&device->common);
> +
> + return 0;
> +
> +err_self_test:
> +err_irq:
> + pci_pool_destroy(device->completion_pool);
> +err_completion_pool:
> + pci_pool_destroy(device->dma_pool);
> +err_dma_pool:
> + kfree(device);
> +err_kzalloc:
> + iounmap(reg_base);
> +err_ioremap:
> + pci_release_regions(pdev);
> +err_request_regions:
> +err_set_dma_mask:
You might want a pci_disable_device() in here.
> +err_enable_device:
> + return err;
> +}
> +
> +static void __devexit cb_remove(struct pci_dev *pdev)
> +{
> + struct cb_device *device;
> +
> + device = pci_get_drvdata(pdev);
> + dma_async_device_unregister(&device->common);
pci_disable_device()?
> + free_irq(device->pdev->irq, device);
> +#ifdef CONFIG_PCI_MSI
> + if (device->msi)
> + pci_disable_msi(device->pdev);
> +#endif
> + pci_pool_destroy(device->dma_pool);
> + pci_pool_destroy(device->completion_pool);
> + iounmap(device->reg_base);
> + pci_release_regions(pdev);
> + kfree(device);
> +}
> +
> +/* MODULE API */
> +MODULE_VERSION("1.0");
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Intel Corporation");
> +
> +static int __init cb_init_module(void)
> +{
> + /* it's currently unsafe to unload this module */
> + /* if forced, worst case is that rmmod hangs */
How come?
> + if (THIS_MODULE != NULL)
> + THIS_MODULE->unsafe = 1;
> +
> + return pci_module_init(&cb_pci_drv);
> +}
> +
> +#define CB_LOW_COMPLETION_MASK 0xffffffc0
> +
> +extern struct list_head dma_device_list;
> +extern struct list_head dma_client_list;
It's strange to see extern decls for lists, but no decl for their lock. A
comment might help.
> +struct cb_dma_chan {
> +
> + void *reg_base;
> +
> + dma_cookie_t completed_cookie;
> + unsigned long last_completion;
> +
> + u32 xfercap; /* XFERCAP register value expanded out */
> +
> + spinlock_t cleanup_lock;
> + spinlock_t desc_lock;
> + struct list_head free_desc;
> + struct list_head used_desc;
> +
> + int pending;
> +
> + struct cb_device *device;
> + struct dma_chan common;
> +
> + dma_addr_t completion_addr;
> + union {
> + u64 full; /* HW completion writeback */
> + struct {
> + u32 low;
> + u32 high;
> + };
> + } *completion_virt;
> +};
Again, is it safe to assume that these parts will never be present in
big-endian machines?
next prev parent reply other threads:[~2006-03-05 8:11 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-03 21:40 [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) Chris Leech
2006-03-03 21:42 ` [PATCH 1/8] [I/OAT] DMA memcpy subsystem Chris Leech
2006-03-04 1:40 ` David S. Miller
2006-03-06 19:39 ` Chris Leech
2006-03-04 19:20 ` Benjamin LaHaise
2006-03-06 19:48 ` Chris Leech
2006-03-03 21:42 ` [PATCH 3/8] [I/OAT] Setup the networking subsystem as a DMA client Chris Leech
2006-03-03 21:42 ` [PATCH 4/8] [I/OAT] Utility functions for offloading sk_buff to iovec copies Chris Leech
2006-03-05 7:15 ` Andrew Morton
2006-03-03 21:42 ` [PATCH 5/8] [I/OAT] Structure changes for TCP recv offload to I/OAT Chris Leech
2006-03-05 7:19 ` Andrew Morton
2006-03-03 21:42 ` [PATCH 6/8] [I/OAT] Rename cleanup_rbuf to tcp_cleanup_rbuf and make non-static Chris Leech
2006-03-03 21:42 ` [PATCH 7/8] [I/OAT] Add a sysctl for tuning the I/OAT offloaded I/O threshold Chris Leech
2006-03-04 11:22 ` Alexey Dobriyan
2006-03-05 7:21 ` Andrew Morton
2006-03-03 21:42 ` [PATCH 8/8] [I/OAT] TCP recv offload to I/OAT Chris Leech
2006-03-04 16:39 ` Pavel Machek
2006-03-04 23:18 ` Greg KH
2006-03-06 19:28 ` Chris Leech
2006-03-05 7:30 ` Andrew Morton
2006-03-05 8:45 ` Andrew Morton
2006-03-05 10:27 ` David S. Miller
2006-03-06 19:36 ` Chris Leech
2006-03-03 22:27 ` [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) Jeff Garzik
2006-03-03 22:39 ` Chris Leech
2006-03-03 22:45 ` Jeff Garzik
2006-03-04 11:35 ` Evgeniy Polyakov
2006-03-05 8:09 ` Andrew Morton [this message]
2006-03-05 9:02 ` Discourage duplicate symbols in the kernel? [Was: Intel I/O Acc...] Sam Ravnborg
2006-03-05 9:18 ` Andrew Morton
2006-03-06 19:56 ` Chris Leech
2006-03-03 22:58 ` [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) Kumar Gala
2006-03-03 23:32 ` Chris Leech
2006-03-04 18:46 ` Jan Engelhardt
2006-03-04 21:41 ` David S. Miller
2006-03-04 22:05 ` Gene Heskett
2006-03-04 22:16 ` David S. Miller
2006-03-05 13:45 ` Jan Engelhardt
2006-03-05 13:55 ` Arjan van de Ven
2006-03-05 16:14 ` Matthieu CASTET
2006-03-05 16:30 ` Jeff Garzik
2006-03-06 19:24 ` Chris Leech
2006-03-06 19:15 ` Chris Leech
2006-03-05 1:43 ` Evgeniy Polyakov
2006-03-05 2:08 ` David S. Miller
2006-03-06 17:44 ` Ingo Oeser
2006-03-07 7:44 ` Evgeniy Polyakov
2006-03-07 9:43 ` Ingo Oeser
2006-03-07 10:16 ` Evgeniy Polyakov
2006-03-11 2:27 Chris Leech
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060305000933.2d799138.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=chris.leech@gmail.com \
--cc=christopher.leech@intel.com \
--cc=jeff@garzik.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).