* problem with blk_queue_bounce_limit() @ 2003-06-05 6:42 David Mosberger 2003-06-05 7:20 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: David Mosberger @ 2003-06-05 6:42 UTC (permalink / raw) To: axboe; +Cc: linux-kernel On platforms with I/O MMU hardware, memory above 4GB, and IDE hard disks, this check: BUG_ON(dma_addr < BLK_BOUNCE_ISA); causes an instant panic. The reason is quite obvious: since there is an I/O MMU, BLK_BOUNCE_ISA is effectively unlimited, and most IDE controllers can of course DMA only to <4GB. So, the check is wrong. I think the proper way to fix this is to pass a "struct dev" into the routine and then to use dma_supported() to check whether bounce buffers will be needed. Do you agree? If so, can you fix it? Thanks, --david ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-05 6:42 problem with blk_queue_bounce_limit() David Mosberger @ 2003-06-05 7:20 ` David S. Miller 2003-06-06 6:42 ` David Mosberger 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2003-06-05 7:20 UTC (permalink / raw) To: davidm; +Cc: axboe, linux-kernel On Wed, 2003-06-04 at 23:42, David Mosberger wrote: > On platforms with I/O MMU hardware, memory above 4GB, and IDE hard disks, > this check: > > BUG_ON(dma_addr < BLK_BOUNCE_ISA); Doesn't panic on sparc64, let this be your guiding light :-) -- David S. Miller <davem@redhat.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-05 7:20 ` David S. Miller @ 2003-06-06 6:42 ` David Mosberger 2003-06-06 6:45 ` David S. Miller 2003-06-06 6:52 ` David S. Miller 0 siblings, 2 replies; 29+ messages in thread From: David Mosberger @ 2003-06-06 6:42 UTC (permalink / raw) To: manfred; +Cc: axboe, davidm, David S. Miller, linux-kernel >>>>> On 05 Jun 2003 00:20:53 -0700, "David S. Miller" <davem@redhat.com> said: DaveM> On Wed, 2003-06-04 at 23:42, David Mosberger wrote: >> On platforms with I/O MMU hardware, memory above 4GB, and IDE hard disks, >> this check: >> BUG_ON(dma_addr < BLK_BOUNCE_ISA); DaveM> Doesn't panic on sparc64, let this be your guiding light :-) I checked with Dave in private email and, like I suspected, this is a (potential) problem on sparc64 as well. Let me be try to be even more clear: on many 64 bit platforms, BLK_BOUNCE_ISA will be bigger than 4GB, so the BUG_ON() will trigger for IDE controllers that can DMA only to, say, 4GB (and it's probably not even an IDE-only problem, I think we don't see it on SCSI because our machines have SCSI controllers that can DMA >4GB). Manfred, I'm readdressed this mail to you because according to google, you're the original author of the patch (http://www.cs.helsinki.fi/linux/linux-kernel/2002-02/0032.html). Like I stated earlier, I think this code simply makes no sense on a platform with I/O MMU. Hence my suggestion to deal with this via dma_supported(). --david ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-06 6:42 ` David Mosberger @ 2003-06-06 6:45 ` David S. Miller 2003-06-06 6:54 ` David Mosberger 2003-06-06 6:52 ` David S. Miller 1 sibling, 1 reply; 29+ messages in thread From: David S. Miller @ 2003-06-06 6:45 UTC (permalink / raw) To: davidm, davidm; +Cc: manfred, axboe, linux-kernel From: David Mosberger <davidm@napali.hpl.hp.com> Date: Thu, 5 Jun 2003 23:42:17 -0700 Manfred, I'm readdressed this mail to you because according to google, you're the original author of the patch (http://www.cs.helsinki.fi/linux/linux-kernel/2002-02/0032.html). Like I stated earlier, I think this code simply makes no sense on a platform with I/O MMU. Hence my suggestion to deal with this via dma_supported(). David, what is blk_max_low_pfn set to on your ia64 systems? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-06 6:45 ` David S. Miller @ 2003-06-06 6:54 ` David Mosberger 2003-06-06 7:08 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: David Mosberger @ 2003-06-06 6:54 UTC (permalink / raw) To: David S. Miller; +Cc: davidm, manfred, axboe, linux-kernel >>>>> On Thu, 05 Jun 2003 23:45:26 -0700 (PDT), "David S. Miller" <davem@redhat.com> said: >> From: David Mosberger <davidm@napali.hpl.hp.com> >> Date: Thu, 5 Jun 2003 23:42:17 -0700 >> Manfred, I'm readdressed this mail to you because according to google, >> you're the original author of the patch >> (http://www.cs.helsinki.fi/linux/linux-kernel/2002-02/0032.html). >> Like I stated earlier, I think this code simply makes no sense on a >> platform with I/O MMU. Hence my suggestion to deal with this via >> dma_supported(). > David, what is blk_max_low_pfn set to on your ia64 systems? max_low_pfn, which is going to be the pfn of some page > 4GB (for machines with a sufficient amount of memory). --david ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-06 6:54 ` David Mosberger @ 2003-06-06 7:08 ` David S. Miller 0 siblings, 0 replies; 29+ messages in thread From: David S. Miller @ 2003-06-06 7:08 UTC (permalink / raw) To: davidm, davidm; +Cc: manfred, axboe, linux-kernel From: David Mosberger <davidm@napali.hpl.hp.com> Date: Thu, 5 Jun 2003 23:54:50 -0700 >>>>> On Thu, 05 Jun 2003 23:45:26 -0700 (PDT), "David S. Miller" <davem@redhat.com> said: > David, what is blk_max_low_pfn set to on your ia64 systems? max_low_pfn, which is going to be the pfn of some page > 4GB (for machines with a sufficient amount of memory). Right, but see my other email, you need to see PCI_DMA_BUS_IS_PHYS properly. This tells the block layer if you're IOMMU or not. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-06 6:42 ` David Mosberger 2003-06-06 6:45 ` David S. Miller @ 2003-06-06 6:52 ` David S. Miller 2003-06-06 7:19 ` David Mosberger 1 sibling, 1 reply; 29+ messages in thread From: David S. Miller @ 2003-06-06 6:52 UTC (permalink / raw) To: davidm, davidm; +Cc: manfred, axboe, linux-kernel Ok, ide-lib.c can call blk_queue_bounce_limit with only 3 different args: 1) BLK_BOUNCE_HIGH (--> blk_max_pfn) 2) BLK_BOUNCE_ANY (--> blk_max_lo_pfn) 3) pci_dev->dma_mask David, you're not setting PCI_DMA_BUS_IS_PHYS properly. For IOMMU it should be set to zero. This tells the block layer if you're IOMMU or not. I knew it would be a bug like this. Anyways, the problem case is #3 above and that will never happen if you start setting PCI_DMA_BUS_IS_PHYS to zero as appropriate. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-06 6:52 ` David S. Miller @ 2003-06-06 7:19 ` David Mosberger 2003-06-06 7:32 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: David Mosberger @ 2003-06-06 7:19 UTC (permalink / raw) To: David S. Miller; +Cc: davidm, manfred, axboe, linux-kernel >>>>> On Thu, 05 Jun 2003 23:52:49 -0700 (PDT), "David S. Miller" <davem@redhat.com> said: David> you're not setting PCI_DMA_BUS_IS_PHYS properly. For IOMMU David> it should be set to zero. This tells the block layer if David> you're IOMMU or not. Ah, yes, thanks for pointing this out. PCI_DMA_BUS_IS_PHYS (and it's description) is quite misleading: it claims that it has something to do with there being an equivalence between PCI bus and physical addresses. That's actually the case for (small) ia64 platforms so that's why we ended up setting it to 1. --david ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-06 7:19 ` David Mosberger @ 2003-06-06 7:32 ` David S. Miller 2003-06-06 7:44 ` Christoph Hellwig 2003-06-06 20:13 ` David Mosberger 0 siblings, 2 replies; 29+ messages in thread From: David S. Miller @ 2003-06-06 7:32 UTC (permalink / raw) To: davidm, davidm; +Cc: manfred, axboe, linux-kernel From: David Mosberger <davidm@napali.hpl.hp.com> Date: Fri, 6 Jun 2003 00:19:08 -0700 PCI_DMA_BUS_IS_PHYS (and it's description) is quite misleading: it claims that it has something to do with there being an equivalence between PCI bus and physical addresses. That's actually the case for (small) ia64 platforms so that's why we ended up setting it to 1. It does have to do with such an equivalence. If your port couldn't work if drivers use the deprecated virt_to_bus/bus_to_virt, you must set PCI_DMA_BUS_IS_PHYS to zero. The whole block layer makes all kinds of assumptions about what physically contiguous addresses mean about how they'll be contiguous in the bus addresses the device will actually use to perform the DMA transfer. Likewise, when PCI_DMA_BUS_IS_PHYS is zero, it knows that many IOMMU'ish things can occur such as taking non-physically-contiguous pages and mapping them using the IOMMU to create bus-contiguous mappings. We could convert the few compile time checks of PCI_DMA_BUS_IS_PHYS so that you can set this based upon the configuration of the machine if for some configurations it is true. drivers/net/tg3.c is the only offender, my bad :-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-06 7:32 ` David S. Miller @ 2003-06-06 7:44 ` Christoph Hellwig 2003-06-06 7:43 ` David S. Miller 2003-06-06 20:13 ` David Mosberger 1 sibling, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2003-06-06 7:44 UTC (permalink / raw) To: David S. Miller; +Cc: davidm, davidm, manfred, axboe, linux-kernel On Fri, Jun 06, 2003 at 12:32:30AM -0700, David S. Miller wrote: > We could convert the few compile time checks of PCI_DMA_BUS_IS_PHYS > so that you can set this based upon the configuration of the machine > if for some configurations it is true. drivers/net/tg3.c is the > only offender, my bad :-) That reminds of of another thing that came up when looking over the scsi bounce limit setting code. All this bounce code ist based purely on a scsi host and if existant it's struct device - imo PCI_DMA_BUS_IS_PHYS should be a propert of each struct device because a machine might have a iommu for one bus type but not another, e.g. dma_is_phys(dev); For those arches that need it this could be expanded to per-bus code, for all those where everything is either phys or not it would be a simple define that ignores the argument and still produces "perfect" code. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-06 7:44 ` Christoph Hellwig @ 2003-06-06 7:43 ` David S. Miller 2003-06-06 7:51 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2003-06-06 7:43 UTC (permalink / raw) To: hch; +Cc: davidm, davidm, manfred, axboe, linux-kernel From: Christoph Hellwig <hch@infradead.org> Date: Fri, 6 Jun 2003 08:44:10 +0100 PCI_DMA_BUS_IS_PHYS should be a propert of each struct device because a machine might have a iommu for one bus type but not another, We know of no such systems. Even in mixed-bus environments such as sparc64 SBUS+PCI systems. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-06 7:43 ` David S. Miller @ 2003-06-06 7:51 ` Christoph Hellwig 0 siblings, 0 replies; 29+ messages in thread From: Christoph Hellwig @ 2003-06-06 7:51 UTC (permalink / raw) To: David S. Miller; +Cc: davidm, davidm, manfred, axboe, linux-kernel On Fri, Jun 06, 2003 at 12:43:05AM -0700, David S. Miller wrote: > From: Christoph Hellwig <hch@infradead.org> > Date: Fri, 6 Jun 2003 08:44:10 +0100 > > PCI_DMA_BUS_IS_PHYS should be a propert of each struct device because > a machine might have a iommu for one bus type but not another, > > We know of no such systems. Even in mixed-bus environments such as > sparc64 SBUS+PCI systems. What about alpha systems with EISA and PCI slots? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-06 7:32 ` David S. Miller 2003-06-06 7:44 ` Christoph Hellwig @ 2003-06-06 20:13 ` David Mosberger 2003-06-07 6:44 ` David S. Miller 1 sibling, 1 reply; 29+ messages in thread From: David Mosberger @ 2003-06-06 20:13 UTC (permalink / raw) To: David S. Miller; +Cc: davidm, davidm, manfred, axboe, linux-kernel >>>>> On Fri, 06 Jun 2003 00:32:30 -0700 (PDT), "David S. Miller" <davem@redhat.com> said: David> From: David Mosberger <davidm@napali.hpl.hp.com> Date: David> Fri, 6 Jun 2003 00:19:08 -0700 David> PCI_DMA_BUS_IS_PHYS (and it's description) is quite David> misleading: it claims that it has something to do with there David> being an equivalence between PCI bus and physical addresses. David> That's actually the case for (small) ia64 platforms so that's David> why we ended up setting it to 1. David> It does have to do with such an equivalence. If your port David> couldn't work if drivers use the deprecated David> virt_to_bus/bus_to_virt, you must set PCI_DMA_BUS_IS_PHYS to David> zero. Yes, but the comment certainly is confusing. How about something like this: /* * PCI_DMA_BUS_IS_PHYS should be set to 1 if there is necessarily a * direct corespondence between device bus addresses and CPU physical * addresses. Platforms with a hardware I/O MMU _must_ turn this off * to suppress the bounce buffer handling code in the block and * network device layers. Platforms with separate bus address spaces * _must_ turn this off and provide a device DMA mapping * implementation that takes care of the necessary address * translation. */ #define PCI_DMA_BUS_IS_PHYS pci_dma_bus_is_phys David> The whole block layer makes all kinds of assumptions about David> what physically contiguous addresses mean about how they'll David> be contiguous in the bus addresses the device will actually David> use to perform the DMA transfer. This sounds all very dramatic, but try as I might, all I find is three places where PCI_DMA_BUS_IS_PHYS is used: - ide-lib.c: used to disable bounce buffering - scsi_lib.c: used to disable bounce buffering - tg3.c: what the heck?? The tg3 code looks ugly in the extreme (sorry). If I understand it right, it's trying to work around a bug which shows up when a packet covers a certain address? With an I/O MMU, you then remap the offending buffer again before freeing the old mapping which will ensure a different address of the packet, whereas in the non-I/O MMU case, you copy the entire socket buffer (since just remapping it won't change the address and since there is no interface to copy just a portion of a socket buffer) and then do the remapping. Did I get this right (or at least close enough)? It seems really bad to me to rely on implementation-specifics of the DMA API. I suspect the code would break on a platform which has separate bus address spaces but no (hardware) I/O TLB? (Yeah, probably not a supported scenarious, but with a proper fix, this wouldn't be a problem.) Does this bug happen often enough that it's performance critical? Otherwise, you could just always use the copy-the-entire-buffer workaround. If its performance-critical, would it make sense to extend the socket buffer API to allow copying a portion of a buffer? I really dislike PCI_DMA_BUS_IS_PHYS, because it introduces a discontinuity. I don't think it should be necessary. If it wasn't for tg3.c, couldn't PCI_DMA_BUS_IS_PHYS be gotten rid of much more cleanly with a dma_max_phys_addr(dev) function, which would return the maximum physical address that device DEV can address (either directly, or via an I/O TLB)? David> We could convert the few compile time checks of David> PCI_DMA_BUS_IS_PHYS so that you can set this based upon the David> configuration of the machine if for some configurations it is David> true. drivers/net/tg3.c is the only offender, my bad :-) Yes. Would you mind fixing that? --david ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-06 20:13 ` David Mosberger @ 2003-06-07 6:44 ` David S. Miller 2003-06-07 7:05 ` David Mosberger 2003-06-07 7:20 ` David Mosberger 0 siblings, 2 replies; 29+ messages in thread From: David S. Miller @ 2003-06-07 6:44 UTC (permalink / raw) To: davidm, davidm; +Cc: manfred, axboe, linux-kernel From: David Mosberger <davidm@napali.hpl.hp.com> Date: Fri, 6 Jun 2003 13:13:38 -0700 Yes, but the comment certainly is confusing. How about something like this: No arguments. David> The whole block layer makes all kinds of assumptions about David> what physically contiguous addresses mean about how they'll David> be contiguous in the bus addresses the device will actually David> use to perform the DMA transfer. This sounds all very dramatic, but try as I might, all I find is three places where PCI_DMA_BUS_IS_PHYS is used: - ide-lib.c: used to disable bounce buffering - scsi_lib.c: used to disable bounce buffering Fix your grep, - tg3.c: what the heck?? In order to workaround a "just below 4GB dma address" bug in the chip, we have to have a reliable way to "remap" the given networking buffer to some other DMA address that does not meet the hw bug case. If we don't have an IOMMU, we have to change the buffer itself and we accomplish this with SKB copy. But on an IOMMU system, we could end up mapping to the same bogus DMA address. So we have to solve this problem by keeping the existng bad mapping, doing a new DMA mapping, then trowing away the old one. Did I get this right (or at least close enough)? Precisely. Otherwise, you could just always use the copy-the-entire-buffer workaround. The new PCI dma mapping I make could map to the SAME bad DMA address, that's the problem. I could loop forever making new DMA mappings on an IOMMU system, each and every one falls into the hw bug case. I really dislike PCI_DMA_BUS_IS_PHYS, because it introduces a discontinuity. I don't think it should be necessary. I totally disagree. David> We could convert the few compile time checks of David> PCI_DMA_BUS_IS_PHYS so that you can set this based upon the David> configuration of the machine if for some configurations it is David> true. drivers/net/tg3.c is the only offender, my bad :-) Yes. Would you mind fixing that? Sure, no problem. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-07 6:44 ` David S. Miller @ 2003-06-07 7:05 ` David Mosberger 2003-06-07 7:11 ` David S. Miller 2003-06-07 7:20 ` David Mosberger 1 sibling, 1 reply; 29+ messages in thread From: David Mosberger @ 2003-06-07 7:05 UTC (permalink / raw) To: David S. Miller; +Cc: davidm, manfred, axboe, linux-kernel >>>>> On Fri, 06 Jun 2003 23:44:01 -0700 (PDT), "David S. Miller" <davem@redhat.com> said: DaveM> But on an IOMMU system, we could end up mapping to the same DaveM> bogus DMA address. Ah, yes, just changing the buffer address doesn't guarantee a different bus address. I missed that. DaveM> So we have to solve this problem by keeping the existng bad DaveM> mapping, doing a new DMA mapping, then trowing away the old DaveM> one. But you're creating a new mapping for the old buffer. What if you had a DMA API implementation which consolidates multiple mapping attempts of the same buffer into a single mapping entry (along with a reference count)? That would break the workaround. Isn't the proper fix to (a) get a new buffer, (b) create a mapping for the new buffer, (c) destroy the mapping for the old buffer. That should guarantee a different bus address, no matter what the DMA-mapping implementation. Plus then you don't have to rely on PCI_DMA_BUS_IS_PHYS. --david ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-07 7:05 ` David Mosberger @ 2003-06-07 7:11 ` David S. Miller 2003-06-10 20:01 ` David Mosberger 2003-06-15 7:06 ` Anton Blanchard 0 siblings, 2 replies; 29+ messages in thread From: David S. Miller @ 2003-06-07 7:11 UTC (permalink / raw) To: davidm, davidm; +Cc: manfred, axboe, linux-kernel From: David Mosberger <davidm@napali.hpl.hp.com> Date: Sat, 7 Jun 2003 00:05:06 -0700 But you're creating a new mapping for the old buffer. What if you had a DMA API implementation which consolidates multiple mapping attempts of the same buffer into a single mapping entry (along with a reference count)? That would break the workaround. I hope nobody is doing this, it would probably break other things we haven't considered yet. You can't support all the BIO_MERGE_BOUNDARY stuff properly in such a scheme. And you _WANT_ to support that when you have an IOMMU, it shrinks the DMA descriptor addr/len entries a chip has to DMA for each block I/O considerably. Isn't the proper fix to (a) get a new buffer, (b) create a mapping for the new buffer, (c) destroy the mapping for the old buffer. That should guarantee a different bus address, no matter what the DMA-mapping implementation. I suppose this would work, fell free to code this up for the tg3 driver for me because I certainly lack the time to do this. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-07 7:11 ` David S. Miller @ 2003-06-10 20:01 ` David Mosberger 2003-06-12 6:47 ` David S. Miller 2003-06-15 7:06 ` Anton Blanchard 1 sibling, 1 reply; 29+ messages in thread From: David Mosberger @ 2003-06-10 20:01 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel, davidm >>>>> On Sat, 07 Jun 2003 00:11:40 -0700 (PDT), "David S. Miller" <davem@redhat.com> said: >> From: David Mosberger <davidm@napali.hpl.hp.com> Date: Sat, 7 >> Jun 2003 00:05:06 -0700 >> Isn't the proper fix to (a) get a new buffer, (b) create a >> mapping for the new buffer, (c) destroy the mapping for the old >> buffer. That should guarantee a different bus address, no matter >> what the DMA-mapping implementation. > I suppose this would work, fell free to code this up for the tg3 > driver for me because I certainly lack the time to do this. How about the attached patch? The non-PCI_DMA_BUS_IS_PHYS code should work already, because it creates the new mapping before destroying the old one(s). The patch has been compiled-tested but not runtime-tested: I can't trigger the bug on ia64, because there, the end of the 4GB space is reserved for firmware purposes, so we'll never have available memory near address 0xffffdcc0. The performance impact should be negligible, as the probability of hitting the bug case should be vanishingly small. --david ===== drivers/net/tg3.c 1.68 vs edited ===== --- 1.68/drivers/net/tg3.c Wed Apr 23 20:02:11 2003 +++ edited/drivers/net/tg3.c Tue Jun 10 12:53:15 2003 @@ -2234,73 +2234,17 @@ schedule_work(&tp->reset_task); } -#if !PCI_DMA_BUS_IS_PHYS -static void tg3_set_txd_addr(struct tg3 *tp, int entry, dma_addr_t mapping) -{ - if (tp->tg3_flags & TG3_FLAG_HOST_TXDS) { - struct tg3_tx_buffer_desc *txd = &tp->tx_ring[entry]; - - txd->addr_hi = ((u64) mapping >> 32); - txd->addr_lo = ((u64) mapping & 0xffffffff); - } else { - unsigned long txd; - - txd = (tp->regs + - NIC_SRAM_WIN_BASE + - NIC_SRAM_TX_BUFFER_DESC); - txd += (entry * TXD_SIZE); - - if (sizeof(dma_addr_t) != sizeof(u32)) - writel(((u64) mapping >> 32), - txd + TXD_ADDR + TG3_64BIT_REG_HIGH); - - writel(((u64) mapping & 0xffffffff), - txd + TXD_ADDR + TG3_64BIT_REG_LOW); - } -} -#endif - static void tg3_set_txd(struct tg3 *, int, dma_addr_t, int, u32, u32); static int tigon3_4gb_hwbug_workaround(struct tg3 *tp, struct sk_buff *skb, u32 guilty_entry, int guilty_len, u32 last_plus_one, u32 *start, u32 mss) { + struct sk_buff *new_skb = skb_copy(skb, GFP_ATOMIC); dma_addr_t new_addr; u32 entry = *start; int i; -#if !PCI_DMA_BUS_IS_PHYS - /* IOMMU, just map the guilty area again which is guaranteed to - * use different addresses. - */ - - i = 0; - while (entry != guilty_entry) { - entry = NEXT_TX(entry); - i++; - } - if (i == 0) { - new_addr = pci_map_single(tp->pdev, skb->data, guilty_len, - PCI_DMA_TODEVICE); - } else { - skb_frag_t *frag = &skb_shinfo(skb)->frags[i - 1]; - - new_addr = pci_map_page(tp->pdev, - frag->page, frag->page_offset, - guilty_len, PCI_DMA_TODEVICE); - } - pci_unmap_single(tp->pdev, pci_unmap_addr(&tp->tx_buffers[guilty_entry], - mapping), - guilty_len, PCI_DMA_TODEVICE); - tg3_set_txd_addr(tp, guilty_entry, new_addr); - pci_unmap_addr_set(&tp->tx_buffers[guilty_entry], mapping, - new_addr); - *start = last_plus_one; -#else - /* Oh well, no IOMMU, have to allocate a whole new SKB. */ - struct sk_buff *new_skb = skb_copy(skb, GFP_ATOMIC); - if (!new_skb) { dev_kfree_skb(skb); return -1; @@ -2337,7 +2281,6 @@ } dev_kfree_skb(skb); -#endif return 0; } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-10 20:01 ` David Mosberger @ 2003-06-12 6:47 ` David S. Miller 0 siblings, 0 replies; 29+ messages in thread From: David S. Miller @ 2003-06-12 6:47 UTC (permalink / raw) To: davidm, davidm; +Cc: linux-kernel From: David Mosberger <davidm@napali.hpl.hp.com> Date: Tue, 10 Jun 2003 13:01:29 -0700 How about the attached patch? Looks "Obviously correct" ;-) I'll apply this, thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-07 7:11 ` David S. Miller 2003-06-10 20:01 ` David Mosberger @ 2003-06-15 7:06 ` Anton Blanchard 2003-06-15 7:11 ` David S. Miller 1 sibling, 1 reply; 29+ messages in thread From: Anton Blanchard @ 2003-06-15 7:06 UTC (permalink / raw) To: David S. Miller; +Cc: davidm, davidm, manfred, axboe, linux-kernel > But you're creating a new mapping for the old buffer. What if you had > a DMA API implementation which consolidates multiple mapping attempts > of the same buffer into a single mapping entry (along with a reference > count)? That would break the workaround. > > I hope nobody is doing this, it would probably break other things > we haven't considered yet. Dave, we talked about this ages ago as a possible alternative to skb recycling and persistent IOMMU mappings for those skbs. Unfortunately you need a hash to map from all of memory to a pci bus address for each host bridge (well IOMMU), and so far I cant think of anything that wouldnt chew gobs of RAM. Anton ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-15 7:06 ` Anton Blanchard @ 2003-06-15 7:11 ` David S. Miller 2003-06-15 8:04 ` Anton Blanchard 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2003-06-15 7:11 UTC (permalink / raw) To: anton; +Cc: davidm, davidm, manfred, axboe, linux-kernel From: Anton Blanchard <anton@samba.org> Date: Sun, 15 Jun 2003 17:06:11 +1000 Dave, we talked about this ages ago as a possible alternative to skb recycling and persistent IOMMU mappings for those skbs. Unfortunately you need a hash to map from all of memory to a pci bus address for each host bridge (well IOMMU), and so far I cant think of anything that wouldnt chew gobs of RAM. The hash table need not be sized wrt. ram, but rather to the expected amount of "DMA in flight" you can expect for the system. You have to make these tables per-IOMMU anyways, which breaks down the problem much further. ROFL, whose workstation name is krispykreme? :-) I just noticed that. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-15 7:11 ` David S. Miller @ 2003-06-15 8:04 ` Anton Blanchard 2003-06-15 8:18 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: Anton Blanchard @ 2003-06-15 8:04 UTC (permalink / raw) To: David S. Miller; +Cc: davidm, davidm, manfred, axboe, linux-kernel > The hash table need not be sized wrt. ram, but rather to the expected > amount of "DMA in flight" you can expect for the system. > > You have to make these tables per-IOMMU anyways, which breaks down the > problem much further. Thats turns out to be a problem because we can have a bunch of IOMMUs on ppc64 - we might have 4 slots per IOMMU. Also, on a partitioned machine, each slot gets an exclusive range of bus addresses to provide isolation between adapters. At the extreme end we might have 512GB of RAM and 48 IOMMUs. A hash table per IOMMU could end up using a lot of RAM. To make the best use of the IOMMU we would want to have it filled with addresses (to have the best chance that we will find an existing mapping). And for zero copy stuff it could be all over RAM. In effect we are looking at a mapping from 512GB -> 3GB. (assuming the top 1GB of PCI addresses are for PCI IO and memory) We also need a quick way to find old entries in order to purge them. > ROFL, whose workstation name is krispykreme? :-) > I just noticed that. Hey thats my laptop :) One is opening up in Penrith Australia next week, I may never need to visit the US again :) Anton ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-15 8:04 ` Anton Blanchard @ 2003-06-15 8:18 ` David S. Miller 0 siblings, 0 replies; 29+ messages in thread From: David S. Miller @ 2003-06-15 8:18 UTC (permalink / raw) To: anton; +Cc: davidm, davidm, manfred, axboe, linux-kernel From: Anton Blanchard <anton@samba.org> Date: Sun, 15 Jun 2003 18:04:42 +1000 To make the best use of the IOMMU we would want to have it filled with addresses (to have the best chance that we will find an existing mapping). And for zero copy stuff it could be all over RAM. In effect we are looking at a mapping from 512GB -> 3GB. (assuming the top 1GB of PCI addresses are for PCI IO and memory) We have this same problem with huge files in the page cache. Maybe you should use a trie. Because that data structure will expand wrt. actual use. I bet the management (especially the shared memory refcount bump) will outweight the costs saved unless your hardware is really stupid. But judging by the e1000/ppc64 issue, your hardware is stupid and thus maybe it's worthwhile for you to continue to pursue this idea :) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-07 6:44 ` David S. Miller 2003-06-07 7:05 ` David Mosberger @ 2003-06-07 7:20 ` David Mosberger 2003-06-07 7:19 ` David S. Miller ` (2 more replies) 1 sibling, 3 replies; 29+ messages in thread From: David Mosberger @ 2003-06-07 7:20 UTC (permalink / raw) To: David S. Miller; +Cc: davidm, manfred, axboe, linux-kernel >>>>> On Fri, 06 Jun 2003 23:44:01 -0700 (PDT), "David S. Miller" <davem@redhat.com> said: DaveM> This sounds all very dramatic, but try as I might, all I DaveM> find is three places where PCI_DMA_BUS_IS_PHYS is used: DaveM> Fix your grep, What am I missing? --david $ find . -type f | xargs fgrep PCI_DMA_BUS_IS_PHYS | fgrep -v /SCCS/ ./drivers/ide/ide-lib.c: if (!PCI_DMA_BUS_IS_PHYS) ./drivers/net/tg3.c:#ifndef PCI_DMA_BUS_IS_PHYS ./drivers/net/tg3.c:#define PCI_DMA_BUS_IS_PHYS 1 ./drivers/net/tg3.c:#if !PCI_DMA_BUS_IS_PHYS ./drivers/net/tg3.c:#if !PCI_DMA_BUS_IS_PHYS ./drivers/scsi/scsi_lib.c: if (PCI_DMA_BUS_IS_PHYS && host_dev && host_dev->dma_mask) ./include/asm-alpha/pci.h:#define PCI_DMA_BUS_IS_PHYS 0 ./include/asm-arm/pci.h:#define PCI_DMA_BUS_IS_PHYS (0) ./include/asm-i386/pci.h:#define PCI_DMA_BUS_IS_PHYS (1) ./include/asm-ia64/pci.h: * PCI_DMA_BUS_IS_PHYS should be set to 1 if there is necessarily a direct corespondence ./include/asm-ia64/pci.h:#define PCI_DMA_BUS_IS_PHYS pci_dma_bus_is_phys ./include/asm-m68k/pci.h:#define PCI_DMA_BUS_IS_PHYS (1) ./include/asm-parisc/pci.h:#define PCI_DMA_BUS_IS_PHYS (1) ./include/asm-ppc/pci.h:#define PCI_DMA_BUS_IS_PHYS (1) ./include/asm-ppc64/pci.h:#define PCI_DMA_BUS_IS_PHYS (0) ./include/asm-s390/pci.h:#define PCI_DMA_BUS_IS_PHYS (1) ./include/asm-sparc/pci.h:#define PCI_DMA_BUS_IS_PHYS (0) ./include/asm-sparc64/pci.h:#define PCI_DMA_BUS_IS_PHYS (0) ./include/asm-um/pci.h:#define PCI_DMA_BUS_IS_PHYS (1) ./include/asm-x86_64/pci.h:#define PCI_DMA_BUS_IS_PHYS (0) ./include/asm-x86_64/pci.h:#define PCI_DMA_BUS_IS_PHYS 1 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-07 7:20 ` David Mosberger @ 2003-06-07 7:19 ` David S. Miller 2003-06-07 9:44 ` Russell King 2003-06-10 20:24 ` David Mosberger 2 siblings, 0 replies; 29+ messages in thread From: David S. Miller @ 2003-06-07 7:19 UTC (permalink / raw) To: davidm, davidm; +Cc: manfred, axboe, linux-kernel From: David Mosberger <davidm@napali.hpl.hp.com> Date: Sat, 7 Jun 2003 00:20:46 -0700 >>>>> On Fri, 06 Jun 2003 23:44:01 -0700 (PDT), "David S. Miller" <davem@redhat.com> said: DaveM> This sounds all very dramatic, but try as I might, all I DaveM> find is three places where PCI_DMA_BUS_IS_PHYS is used: DaveM> Fix your grep, What am I missing? Sorry, I was meaning to mention BIO_VMERGE_BOUNDARY which works in concert with PCI_DMA_BUS_IS_PHYS. You really have to set it accurately to get the most out of the block layer's ability to take advantage of IOMMUs. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-07 7:20 ` David Mosberger 2003-06-07 7:19 ` David S. Miller @ 2003-06-07 9:44 ` Russell King 2003-06-07 9:47 ` David S. Miller 2003-06-10 20:24 ` David Mosberger 2 siblings, 1 reply; 29+ messages in thread From: Russell King @ 2003-06-07 9:44 UTC (permalink / raw) To: davidm; +Cc: David S. Miller, manfred, axboe, linux-kernel On Sat, Jun 07, 2003 at 12:20:46AM -0700, David Mosberger wrote: > ./include/asm-arm/pci.h:#define PCI_DMA_BUS_IS_PHYS (0) I suspect we probably set this incorrectly; we have some platforms where there is merely an offset between the phys address and the bus address. For these, I think we want to set this to 1. Other platforms require the dma functions to allocate a new buffer and copy the data to work around buggy "wont fix" errata (eg, new buffer below 1MB) and for these I think we want to leave this at 0. It is rather unfortunate that this got called "PCI_xxx" since it has been used in a non pci-bus manner in (eg) the scsi layer. Also note that I have platforms where the dma_mask is a real mask not "a set of zeros followed by a set of ones from MSB to LSB." I can see this breaking the block layer if PCI_DMA_BUS_IS_PHYS is defined to one. 8/ -- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-07 9:44 ` Russell King @ 2003-06-07 9:47 ` David S. Miller 2003-06-07 13:23 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2003-06-07 9:47 UTC (permalink / raw) To: rmk; +Cc: davidm, manfred, axboe, linux-kernel From: Russell King <rmk@arm.linux.org.uk> Date: Sat, 7 Jun 2003 10:44:34 +0100 It is rather unfortunate that this got called "PCI_xxx" since it has been used in a non pci-bus manner in (eg) the scsi layer. I agree, the idea at the time that Jens and myself did this work was that the generic device layer would provide interfaces by which we could test this given a struct device. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-07 9:47 ` David S. Miller @ 2003-06-07 13:23 ` Christoph Hellwig 0 siblings, 0 replies; 29+ messages in thread From: Christoph Hellwig @ 2003-06-07 13:23 UTC (permalink / raw) To: David S. Miller; +Cc: rmk, davidm, manfred, axboe, linux-kernel On Sat, Jun 07, 2003 at 02:47:04AM -0700, David S. Miller wrote: > I agree, the idea at the time that Jens and myself did this > work was that the generic device layer would provide interfaces > by which we could test this given a struct device. Wouldn't that be the dma_is_phys(dev) call I mentioned earlier in this thread that you didn't like? :) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() 2003-06-07 7:20 ` David Mosberger 2003-06-07 7:19 ` David S. Miller 2003-06-07 9:44 ` Russell King @ 2003-06-10 20:24 ` David Mosberger 2 siblings, 0 replies; 29+ messages in thread From: David Mosberger @ 2003-06-10 20:24 UTC (permalink / raw) To: hch, James.Bottomley; +Cc: davem, axboe, linux-kernel Christoph, You wrote: > imo PCI_DMA_BUS_IS_PHYS should be a propert of each struct device > because a machine might have a iommu for one bus type but not > another, e.g. > dma_is_phys(dev); As pointed out by DaveM, this isn't sufficient for the block layer, which needs to know the page size of the I/O MMU so it can make merging decisions about physically discontiguous buffers. I think we also need: /* * Returns a mask of bits which need to be 0 in order for * the DMA-mapping interface to be able to remap a buffer. * DMA-mapping implementations for real (hardware) I/O MMUs * will want to return (iommu_page_size - 1) here, if they * support such remapping. DMA-mapping implementations which * do not support remapping must return a mask of all 1s. */ unsigned long dma_merge_mask(dev) Then you can replace: BIO_VMERGE_BOUNDARY => (dma_merge_mask(dev) + 1) Of course, this doesn't work literally, because we don't have a device pointer handy in the bio code. Instead, it would probably make the most sense to add a "iommu_merge_mask" member to "struct request_queue" and then do something along the lines of: #define BIO_VMERGE_BOUNDARY(q) ((q)->iommu_merge_mask + 1) Note 1: the "+ 1" will get optimized away because the only way BIO_VMERGE_BOUNDARY() is used is in BIOVEC_VIRT_MERGEABLE, which really needs a mask anyhow; this could be cleaned up of course, but that's a separate issue. Note 2: dma_merge_mask() cannot be used to replace dma_is_phys() (as much as I'd like that), because they issue of (virtual) remapping is really quite distinct from whether a (hardware) I/O MMU is present (not to mention the _other_ reason that a bus may not be "physical"). Note 3: I'm not comfortable hacking the bio code, so if someone would like to prototype this, by all means go ahead... ;-) Thanks, --david ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: problem with blk_queue_bounce_limit() @ 2003-06-06 18:06 James Bottomley 0 siblings, 0 replies; 29+ messages in thread From: James Bottomley @ 2003-06-06 18:06 UTC (permalink / raw) To: David S. Miller; +Cc: Christoph Hellwig, Linux Kernel From: Christoph Hellwig <hch@infradead.org> Date: Fri, 6 Jun 2003 08:44:10 +0100 PCI_DMA_BUS_IS_PHYS should be a propert of each struct device because a machine might have a iommu for one bus type but not another, We know of no such systems. Even in mixed-bus environments such as sparc64 SBUS+PCI systems. Actually, I know of such a system, although, I will admit they're obscure. Certain PA-RISC machines with EISA buses have an IOMMU on the EISA bus alone (not on the GSC bus). It's job was really to cope with the ISA DMA issue nicely, but it qualifies. It was for machines like this that I've already been thinking about introducing a dma_is_phys(dev) macro, which I originally left out of the generic device DMA API. James ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2003-06-15 8:09 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-06-05 6:42 problem with blk_queue_bounce_limit() David Mosberger 2003-06-05 7:20 ` David S. Miller 2003-06-06 6:42 ` David Mosberger 2003-06-06 6:45 ` David S. Miller 2003-06-06 6:54 ` David Mosberger 2003-06-06 7:08 ` David S. Miller 2003-06-06 6:52 ` David S. Miller 2003-06-06 7:19 ` David Mosberger 2003-06-06 7:32 ` David S. Miller 2003-06-06 7:44 ` Christoph Hellwig 2003-06-06 7:43 ` David S. Miller 2003-06-06 7:51 ` Christoph Hellwig 2003-06-06 20:13 ` David Mosberger 2003-06-07 6:44 ` David S. Miller 2003-06-07 7:05 ` David Mosberger 2003-06-07 7:11 ` David S. Miller 2003-06-10 20:01 ` David Mosberger 2003-06-12 6:47 ` David S. Miller 2003-06-15 7:06 ` Anton Blanchard 2003-06-15 7:11 ` David S. Miller 2003-06-15 8:04 ` Anton Blanchard 2003-06-15 8:18 ` David S. Miller 2003-06-07 7:20 ` David Mosberger 2003-06-07 7:19 ` David S. Miller 2003-06-07 9:44 ` Russell King 2003-06-07 9:47 ` David S. Miller 2003-06-07 13:23 ` Christoph Hellwig 2003-06-10 20:24 ` David Mosberger 2003-06-06 18:06 James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).