* [PATCH 2.6.19.1] fix aoe without scatter-gather [Bug 7662] [not found] <20061209234305.c65b4e14.akpm@osdl.org> @ 2006-12-18 17:53 ` Ed L. Cashin 2006-12-18 22:21 ` bio pages with zero page reference count Ed L. Cashin 0 siblings, 1 reply; 4+ messages in thread From: Ed L. Cashin @ 2006-12-18 17:53 UTC (permalink / raw) To: linux-kernel Cc: Greg KH, boddingt, Andrew Morton, bugme-daemon@kernel-bugs.osdl.org The patch below fixes a bug that only appears when AoE goes over a network card that does not support scatter-gather. The headers in the linear part of the skb appeared to be larger than they really were, resulting in data that was offset by 24 bytes. This patch eliminates the offset data on cards that don't support scatter-gather or have had scatter-gather turned off. There remains an unrelated issue that I'll address in a separate email. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> diff -uprN linux-2.6.19.orig/drivers/block/aoe/aoecmd.c linux-2.6.19.mod/drivers/block/aoe/aoecmd.c --- linux-2.6.19.orig/drivers/block/aoe/aoecmd.c 2006-12-11 18:15:42.322711000 -0500 +++ linux-2.6.19.mod/drivers/block/aoe/aoecmd.c 2006-12-12 17:12:59.307200500 -0500 @@ -30,8 +30,6 @@ new_skb(ulong len) skb->nh.raw = skb->mac.raw = skb->data; skb->protocol = __constant_htons(ETH_P_AOE); skb->priority = 0; - skb_put(skb, len); - memset(skb->head, 0, len); skb->next = skb->prev = NULL; /* tell the network layer not to perform IP checksums @@ -122,8 +120,8 @@ aoecmd_ata_rw(struct aoedev *d, struct f skb = f->skb; h = (struct aoe_hdr *) skb->mac.raw; ah = (struct aoe_atahdr *) (h+1); - skb->len = sizeof *h + sizeof *ah; - memset(h, 0, ETH_ZLEN); + skb_put(skb, sizeof *h + sizeof *ah); + memset(h, 0, skb->len); f->tag = aoehdr_atainit(d, h); f->waited = 0; f->buf = buf; @@ -149,7 +147,6 @@ aoecmd_ata_rw(struct aoedev *d, struct f skb->len += bcnt; skb->data_len = bcnt; } else { - skb->len = ETH_ZLEN; writebit = 0; } @@ -206,6 +203,7 @@ aoecmd_cfg_pkts(ushort aoemajor, unsigne printk(KERN_INFO "aoe: skb alloc failure\n"); continue; } + skb_put(skb, sizeof *h + sizeof *ch); skb->dev = ifp; if (sl_tail == NULL) sl_tail = skb; @@ -243,6 +241,7 @@ freeframe(struct aoedev *d) continue; if (atomic_read(&skb_shinfo(f->skb)->dataref) == 1) { skb_shinfo(f->skb)->nr_frags = f->skb->data_len = 0; + skb_trim(f->skb, 0); return f; } n++; @@ -698,8 +697,8 @@ aoecmd_ata_id(struct aoedev *d) skb = f->skb; h = (struct aoe_hdr *) skb->mac.raw; ah = (struct aoe_atahdr *) (h+1); - skb->len = ETH_ZLEN; - memset(h, 0, ETH_ZLEN); + skb_put(skb, sizeof *h + sizeof *ah); + memset(h, 0, skb->len); f->tag = aoehdr_atainit(d, h); f->waited = 0; -- Ed L Cashin <ecashin@coraid.com> ^ permalink raw reply [flat|nested] 4+ messages in thread
* bio pages with zero page reference count 2006-12-18 17:53 ` [PATCH 2.6.19.1] fix aoe without scatter-gather [Bug 7662] Ed L. Cashin @ 2006-12-18 22:21 ` Ed L. Cashin 2006-12-18 22:53 ` Christoph Hellwig 0 siblings, 1 reply; 4+ messages in thread From: Ed L. Cashin @ 2006-12-18 22:21 UTC (permalink / raw) To: linux-kernel Cc: Greg KH, boddingt, Andrew Morton, bugme-daemon, Christoph Hellwig (This email is a followup to "Re: [PATCH 2.6.19.1] fix aoe without scatter-gather [Bug 7662]".) On Mon, Dec 18, 2006 at 12:53:00PM -0500, Ed L. Cashin wrote: ... > This patch eliminates the offset data on cards that don't support > scatter-gather or have had scatter-gather turned off. There remains > an unrelated issue that I'll address in a separate email. After fixing the problem with the skb headers, we noticed that there were still problems when scatter gather wasn't in use. XFS was giving us bios that had pages with a reference count of zero. The aoe driver sets up the skb with the frags pointing to the pages, and when scatter gather isn't supported and __pskb_pull_tail gets involved, put_page is called after the data is copied from the pages. That causes problems because of the zero page reference count. It seems like it would always be incorrect for one part of the kernel to give pages with a zero reference count to another part of the kernel, so this seems like a bug in XFS. Christoph Hellwig, though, points out, > It's a kmalloced page. The same can happen with ext3 aswell, but > only when doing log recovery. The last time this came up (vs > iscsi) the conclusion was that the driver needs to handle this > case. In attempting to find the conversation he was referencing, I only found this: Subject: tcp_sendpage and page allocation lifetime vs. iscsi Date: 2005-04-25 17:02:59 GMT http://article.gmane.org/gmane.linux.kernel/298377 If anyone has a better reference, I'd like to see it. -- Ed L Cashin <ecashin@coraid.com> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bio pages with zero page reference count 2006-12-18 22:21 ` bio pages with zero page reference count Ed L. Cashin @ 2006-12-18 22:53 ` Christoph Hellwig 2007-01-19 16:21 ` Ed L. Cashin 0 siblings, 1 reply; 4+ messages in thread From: Christoph Hellwig @ 2006-12-18 22:53 UTC (permalink / raw) To: support Cc: linux-kernel, Greg KH, boddingt, Andrew Morton, bugme-daemon, Christoph Hellwig On Mon, Dec 18, 2006 at 05:21:09PM -0500, Ed L. Cashin wrote: > (This email is a followup to "Re: [PATCH 2.6.19.1] fix aoe without > scatter-gather [Bug 7662]".) > > On Mon, Dec 18, 2006 at 12:53:00PM -0500, Ed L. Cashin wrote: > ... > > This patch eliminates the offset data on cards that don't support > > scatter-gather or have had scatter-gather turned off. There remains > > an unrelated issue that I'll address in a separate email. > > After fixing the problem with the skb headers, we noticed that there > were still problems when scatter gather wasn't in use. XFS was giving > us bios that had pages with a reference count of zero. > > The aoe driver sets up the skb with the frags pointing to the pages, > and when scatter gather isn't supported and __pskb_pull_tail gets > involved, put_page is called after the data is copied from the pages. > That causes problems because of the zero page reference count. > > It seems like it would always be incorrect for one part of the kernel > to give pages with a zero reference count to another part of the > kernel, so this seems like a bug in XFS. > > Christoph Hellwig, though, points out, > > > It's a kmalloced page. The same can happen with ext3 aswell, but > > only when doing log recovery. The last time this came up (vs > > iscsi) the conclusion was that the driver needs to handle this > > case. > > In attempting to find the conversation he was referencing, I only > found this: > > Subject: tcp_sendpage and page allocation lifetime vs. iscsi > Date: 2005-04-25 17:02:59 GMT > http://article.gmane.org/gmane.linux.kernel/298377 > > If anyone has a better reference, I'd like to see it. I searched around a little bit and found these: http://groups.google.at/group/open-iscsi/browse_frm/thread/17fbe253cf1f69dd/f26cf19b0fee9147?tvc=1&q=kmalloc+iscsi+%22christoph+hellwig%22&hl=de#f26cf19b0fee9147 http://www.ussg.iu.edu/hypermail/linux/kernel/0408.3/0061.html But that's not the conclusion I was looking for. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: bio pages with zero page reference count 2006-12-18 22:53 ` Christoph Hellwig @ 2007-01-19 16:21 ` Ed L. Cashin 0 siblings, 0 replies; 4+ messages in thread From: Ed L. Cashin @ 2007-01-19 16:21 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-kernel, Andrew Morton, xfs, Alan Cox On Mon, Dec 18, 2006 at 10:53:43PM +0000, Christoph Hellwig wrote: > On Mon, Dec 18, 2006 at 05:21:09PM -0500, Ed L. Cashin wrote: ... > > If anyone has a better reference, I'd like to see it. > > I searched around a little bit and found these: > > http://groups.google.at/group/open-iscsi/browse_frm/thread/17fbe253cf1f69dd/f26cf19b0fee9147?tvc=1&q=kmalloc+iscsi+%22christoph+hellwig%22&hl=de#f26cf19b0fee9147 > http://www.ussg.iu.edu/hypermail/linux/kernel/0408.3/0061.html > > But that's not the conclusion I was looking for. So it sounds like you've been advocating a general discussion of this issue for a few years now. To summarize the issue: 1) users of the block layer assume that it's fine to associate pages that have a zero reference count with a bio before requesting I/O, 2) intermediaries like iscsi, aoe, and drbd, associate the pages with the frags of skbuffs, but 3) when the network layer has to linearize the skbuff for a network device that doesn't support scatter gather, it winds up doing a get_page and put_page on each page in the frags, despite the fact that the page reference count on each may already be zero. The network layer is assuming that it's OK to do use these operations on any page in the frags. Maybe the discussion is slow to start because too many parts of the kernel are involved. Here are a couple of specific questions. Maybe they'll help get the ball rolling. 1) What are the disadvantages of making the network layer *not* to assume it's correct to use get/put_page on the frags when it linearizes an sk_buff? For example, the network layer could omit the get/put_page when the page reference count is zero. 2) What are the disadvantages of having one part of the kernel (e.g., XFS) reference a page before handing it off to another part of the kernel, e.g., in a bio? This change would require multiple parts of the kernel to change behavior, but it seems conceptually cleaner, since the reference count would reflect the reality that the page does have an owner (XFS or whoever). I don't know how practical the implementation would be. 3) It seems messy to handle this is in each of the individual intermediary drivers that sit between the block and network layers, but if that really is the place to do it, then is there a problem with simply incrementing the page reference counts upon getting a bio from the block layer, and later decrementing them before giving them back with bio_endio? bio_for_each_segment(bv, bio, i) atomic_inc(&bv->bv_page->_count); ... [and later] bio_for_each_segment(bv, bio, i) atomic_dec(&bv->bv_page->_count); bio_endio(bio, bytes_done, error); That seems to eliminate problems aoe users have with XFS on AoE devices that are accessible via network devices that don't support scatter gather, but is it the right fix? Andrew Morton changed "count" to "_count" to stop folks from directly manipulating the page struct member, but I don't see any get/put_page type operations that fit what the aoe driver has to do in this case. -- Ed L Cashin <ecashin@coraid.com> ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-01-19 16:24 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20061209234305.c65b4e14.akpm@osdl.org> 2006-12-18 17:53 ` [PATCH 2.6.19.1] fix aoe without scatter-gather [Bug 7662] Ed L. Cashin 2006-12-18 22:21 ` bio pages with zero page reference count Ed L. Cashin 2006-12-18 22:53 ` Christoph Hellwig 2007-01-19 16:21 ` Ed L. Cashin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).