All of lore.kernel.org
 help / color / mirror / Atom feed
* bug in bio_map_user_iov
@ 2010-11-17 21:24 Michael Demmer
  2010-11-18  0:02 ` Neil Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Demmer @ 2010-11-17 21:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Neil Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 626 bytes --]

Hello all,

I've been doing some work with a Linux kernel module that enables zero-copy I/O to block devices using a custom user/kernel interface. In porting this from an older Linux kernel to a more modern release, I ran into an issue when interacting with MD devices that I traced back to what I believe to be a problem in bio_map_user_iov.

The problem and fix are described in the first attached patch. The second is a simple test module and user program which triggers the bug and validates the fix.

Thanks,
-m

ps. This is my first attempt at pushing a patch upstream so please forgive any newbie mistakes.


[-- Attachment #2: 0001-kernel-fix-bug-in-bio_map_user_iov-on-MD-devices.patch --]
[-- Type: application/octet-stream, Size: 1462 bytes --]

From 3bfbc8ed41c9232e8fa0d55e4093935d0af32a75 Mon Sep 17 00:00:00 2001
From: Michael Demmer <demmer@riverbed.com>
Date: Wed, 17 Nov 2010 11:22:44 -0800
Subject: [PATCH 1/2] kernel: fix bug in bio_map_user_iov on MD devices

While building up the iovec to pass to the lower level device,
__bio_map_user_iov calls __bio_add_page which tries to merge
consecutive block pointers rather than splitting them into
multiple iov entries, giving the underlying driver an opportunity
to restrict these merges through the merge_bvec_fn.

For MD devices, the merge function looks through the bio at the
bi_bdev, however bio_map_user_iov doesn't assign the block_device
pointer to the bio until after all the merging is done. This patch
simply reorders the assignment so that bio operations on MD devices
don't crash on a null pointer.
---
 fs/bio.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 4bd454f..7f29582 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -989,6 +989,8 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
 	if (!bio)
 		return ERR_PTR(-ENOMEM);
 
+	bio->bi_bdev = bdev;
+
 	ret = -ENOMEM;
 	pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask);
 	if (!pages)
@@ -1046,7 +1048,6 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
 	if (!write_to_vm)
 		bio->bi_rw |= REQ_WRITE;
 
-	bio->bi_bdev = bdev;
 	bio->bi_flags |= (1 << BIO_USER_MAPPED);
 	return bio;
 
-- 
1.7.1


[-- Attachment #3: 0002-Test-program-to-verify-the-bio_map_user-bug-fix-for-.patch --]
[-- Type: application/octet-stream, Size: 4399 bytes --]

From 563d925fa93a04d12d1ad869b306193198f688c1 Mon Sep 17 00:00:00 2001
From: Michael Demmer <demmer@riverbed.com>
Date: Wed, 17 Nov 2010 11:11:09 -0800
Subject: [PATCH 2/2] Test program to verify the bio_map_user bug fix for md devices.

Usage:
cd bio_map_user_bug
insmod bio_bug.ko
mknod /dev/bio_bug c 999 1
./bio_bug_test /dev/mdX
---
 bio_map_user_bug/Makefile       |   16 ++++++++
 bio_map_user_bug/bio_bug.h      |    5 +++
 bio_map_user_bug/bio_bug_mod.c  |   74 +++++++++++++++++++++++++++++++++++++++
 bio_map_user_bug/bio_bug_test.c |   44 +++++++++++++++++++++++
 4 files changed, 139 insertions(+), 0 deletions(-)
 create mode 100644 bio_map_user_bug/Makefile
 create mode 100644 bio_map_user_bug/bio_bug.h
 create mode 100644 bio_map_user_bug/bio_bug_mod.c
 create mode 100644 bio_map_user_bug/bio_bug_test.c

diff --git a/bio_map_user_bug/Makefile b/bio_map_user_bug/Makefile
new file mode 100644
index 0000000..80756c5
--- /dev/null
+++ b/bio_map_user_bug/Makefile
@@ -0,0 +1,16 @@
+
+obj-m:= bio_bug.o
+
+bio_bug-objs := bio_bug_mod.o
+
+all: bio_bug.ko bio_bug_test
+
+bio_bug.ko:
+	make -C .. M=`pwd` modules
+
+bio_bug_test: bio_bug_test.c
+	$(CC) -Wall -g $< -o $@
+
+clean:
+	make -C .. M=`pwd` clean
+	rm -f bio_bug_test
diff --git a/bio_map_user_bug/bio_bug.h b/bio_map_user_bug/bio_bug.h
new file mode 100644
index 0000000..97aa0b9
--- /dev/null
+++ b/bio_map_user_bug/bio_bug.h
@@ -0,0 +1,5 @@
+
+struct bio_map_user_bug_args {
+	char devname[64];
+	unsigned long useraddr;
+};
diff --git a/bio_map_user_bug/bio_bug_mod.c b/bio_map_user_bug/bio_bug_mod.c
new file mode 100644
index 0000000..d01bf5b
--- /dev/null
+++ b/bio_map_user_bug/bio_bug_mod.c
@@ -0,0 +1,74 @@
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/blkdev.h>
+#include <linux/version.h>
+#include <linux/err.h>
+
+#include "bio_bug.h"
+
+static long
+bug_ioctl(struct file *file, unsigned int c, unsigned long v)
+{
+    struct bio_map_user_bug_args args;
+    struct block_device *bdev;
+    struct request_queue *q;
+    struct bio *bio;
+
+    if (copy_from_user(&args, (void*)v, sizeof(args))) {
+        return -EFAULT;
+    }
+
+    printk("testing bio_map_user bug on %s...\n", args.devname);
+
+    bdev = open_bdev_exclusive(args.devname, FMODE_READ|FMODE_WRITE, (void*)0x01010101);
+    if (IS_ERR(bdev)) {
+	printk("error opening block device\n");
+	return -EINVAL;
+    }	
+
+    q = bdev_get_queue(bdev);
+    if (q == NULL) {
+        printk("block device has queue missing.\n");
+        return -EFAULT;
+    }
+    
+    // will crash here
+    bio = bio_map_user(q, bdev, args.useraddr, 4096, 0, GFP_KERNEL);
+
+    close_bdev_exclusive(bdev, FMODE_READ|FMODE_WRITE);
+    
+    printk("testing bio_map_user bug on %s... ok\n", args.devname);
+    return 0;
+}
+
+static struct file_operations bio_bug_fops = {
+    owner: THIS_MODULE,
+    unlocked_ioctl: bug_ioctl
+};
+
+static int __init 
+bug_init(void) 
+{    
+    printk("registering device\n");
+
+    if (register_chrdev(999, "bio_bug", &bio_bug_fops)) {
+        printk("ERROR register_chrdev for %u in /dev/%s.\n",
+               999, "bio_bug");
+    }
+
+    return 0;
+}
+
+static void __exit 
+bug_exit(void) 
+{
+    printk("unregistering device\n");
+    unregister_chrdev(999, "bio_bug");
+}
+
+
+module_init(bug_init);
+module_exit(bug_exit);
+
+MODULE_LICENSE("GPL");
diff --git a/bio_map_user_bug/bio_bug_test.c b/bio_map_user_bug/bio_bug_test.c
new file mode 100644
index 0000000..1f97b71
--- /dev/null
+++ b/bio_map_user_bug/bio_bug_test.c
@@ -0,0 +1,44 @@
+
+#include <stdio.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <malloc.h>
+#include <string.h>
+#include <sys/ioctl.h>
+
+#include "bio_bug.h"
+
+int main(int argc, char* argv[]) {
+	int fd;
+	struct bio_map_user_bug_args args;
+	void* p;
+
+	if (argc < 2) {
+		printf("usage: %s <devname>\n", argv[0]);
+		return 1;
+	}	
+
+	fd = open("/dev/bio_bug", O_RDWR);
+	if (fd < 0) {
+		printf("error opening device: %s\n", strerror(errno));
+		return 1;
+	}
+
+	p = memalign(4096, 4096);
+	if (p == 0) {
+		printf("error in memalign: %s\n", strerror(errno));
+		return 1;
+	}
+
+	strncpy(args.devname, argv[1], sizeof(args.devname));
+	args.useraddr = (unsigned long)p;
+
+	if (ioctl(fd, 999, &args) != 0) {
+		printf("error in ioctl: %s\n", strerror(errno));
+		return 1;
+	}
+
+	return 0;
+}
+
+
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: bug in bio_map_user_iov
  2010-11-17 21:24 bug in bio_map_user_iov Michael Demmer
@ 2010-11-18  0:02 ` Neil Brown
  2010-11-18 21:08   ` Michael Demmer
  0 siblings, 1 reply; 4+ messages in thread
From: Neil Brown @ 2010-11-18  0:02 UTC (permalink / raw)
  To: Michael Demmer; +Cc: Jens Axboe, linux-raid

On Wed, 17 Nov 2010 13:24:47 -0800
Michael Demmer <Michael.Demmer@riverbed.com> wrote:

> Hello all,
> 
> I've been doing some work with a Linux kernel module that enables zero-copy I/O to block devices using a custom user/kernel interface. In porting this from an older Linux kernel to a more modern release, I ran into an issue when interacting with MD devices that I traced back to what I believe to be a problem in bio_map_user_iov.
> 
> The problem and fix are described in the first attached patch. The second is a simple test module and user program which triggers the bug and validates the fix.
> 
> Thanks,
> -m
> 
> ps. This is my first attempt at pushing a patch upstream so please forgive any newbie mistakes.
> 

I think the real issue here is that bio_map_user is an interface that was
only intended to be used by bottom level devices like SCSI drivers etc.
It is a function that a device driver can use if it knows that it makes sense
to use it.

You are trying to use it as a generic interface that works for all block
devices, and it wasn't intended for that.

So while it is reasonably simple to 'fix' bio_map_user_iov, it is not
possible to 'fix' bio_map_kern_iov in the same way, because it doesn't have
access to the bdev at all.


So the question we should be asking is: are you really using the right
interface for the job?  Is bio_map_user something that you really should be
using?
And to answer that, we would need to know what you are trying to do.

And why isn't O_DIRECT a suitable zero-copy interface for I/O to block
devices?

NeilBrown


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bug in bio_map_user_iov
  2010-11-18  0:02 ` Neil Brown
@ 2010-11-18 21:08   ` Michael Demmer
  2010-11-19 20:08     ` John Stoffel
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Demmer @ 2010-11-18 21:08 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jens Axboe, linux-raid

>> Hello all,
>> 
>> I've been doing some work with a Linux kernel module that enables zero-copy I/O to block devices using a custom user/kernel interface. In porting this from an older Linux kernel to a more modern release, I ran into an issue when interacting with MD devices that I traced back to what I believe to be a problem in bio_map_user_iov.
>> 
>> The problem and fix are described in the first attached patch. The second is a simple test module and user program which triggers the bug and validates the fix.
>> 
>> Thanks,
>> -m
>> 
>> ps. This is my first attempt at pushing a patch upstream so please forgive any newbie mistakes.
>> 
> 
> I think the real issue here is that bio_map_user is an interface that was
> only intended to be used by bottom level devices like SCSI drivers etc.
> It is a function that a device driver can use if it knows that it makes sense
> to use it.
> 
> You are trying to use it as a generic interface that works for all block
> devices, and it wasn't intended for that.
> 
> So while it is reasonably simple to 'fix' bio_map_user_iov, it is not
> possible to 'fix' bio_map_kern_iov in the same way, because it doesn't have
> access to the bdev at all.
> 
> 
> So the question we should be asking is: are you really using the right
> interface for the job?  Is bio_map_user something that you really should be
> using?
> And to answer that, we would need to know what you are trying to do.
> 
> And why isn't O_DIRECT a suitable zero-copy interface for I/O to block
> devices?

Thanks -- I understand your concerns, but unfortunately I can't really go into more detail about our use case since it's something we use internally. Given which, I understand your potential reluctance to take the patch.

Best,
-m




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bug in bio_map_user_iov
  2010-11-18 21:08   ` Michael Demmer
@ 2010-11-19 20:08     ` John Stoffel
  0 siblings, 0 replies; 4+ messages in thread
From: John Stoffel @ 2010-11-19 20:08 UTC (permalink / raw)
  To: Michael Demmer; +Cc: Neil Brown, Jens Axboe, linux-raid

>>>>> "Michael" == Michael Demmer <Michael.Demmer@riverbed.com> writes:

Michael> Thanks -- I understand your concerns, but unfortunately I
Michael> can't really go into more detail about our use case since
Michael> it's something we use internally. Given which, I understand
Michael> your potential reluctance to take the patch.

Duh, they're an IP accelerator company, so of course the want to
maximize speed from their RAID array on the appliance to the various
ethernet interfaces.  

John

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-11-19 20:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-17 21:24 bug in bio_map_user_iov Michael Demmer
2010-11-18  0:02 ` Neil Brown
2010-11-18 21:08   ` Michael Demmer
2010-11-19 20:08     ` John Stoffel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.