All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 2.6.20] updated dm-loop patch
@ 2007-02-12  8:49 roland
  2007-02-12 15:24 ` Bryn M. Reeves
  0 siblings, 1 reply; 9+ messages in thread
From: roland @ 2007-02-12  8:49 UTC (permalink / raw)
  To: Bryn M. Reeves; +Cc: dm-devel

Hi Bryn,

after some first tests which looked very very promising (could dmlosetup 
files >2gb, could create more than 256 loop dm-loop devices...), i have bad 
news.
maybe it`s not that "bad" because you may be able to fix it quickly, but it 
seems that dm-loop is racy (or whatever). maybe smp safeness, since i was 
testing on a P4 with HT enabled ? i did my first tests on non-smp system 
(VM), but it wasn`t put under that load as i did now.

i created some larger dm-loops, put a filesystem on that and generated some 
heavy load (loadavg >40)

- loop0: 1gig - reiserfs - untar kernel 2.6.20 and "make -j 16 bzImage 
modules"
- loop1: 5gig - ext3 - ditto
- loop2: mkfs.reiserfs needed some time because of system being so busy. 
while waiting for creation, i switched to the next vt
- while creating the 4th image-file and while mkfs.reiserfs was busy 
creating filesystem on loop2, bang - the whole thing crashed.

i have attached the dmesg output and i`m happy to assist for doing some more 
tests like this to get this nice piece of work stable ! :)

regards
roland

ps:
hey, why not announcing this on lkml, so this gut get some more notice or 
being reviewed by others?


Feb 12 12:58:56 test kernel: EXT3 FS on dm-1, internal journal
Feb 12 12:58:56 test kernel: EXT3-fs: mounted filesystem with ordered data 
mode.
Feb 12 12:58:56 test udevd-event[7064]: run_program: ressize 256 too short
Feb 12 13:21:05 test kernel: device-mapper: loop: not using 4294967296 bytes 
in incomplete block at EOF
Feb 12 13:21:05 test kernel: device-mapper: loop DEBUG: allocated linear 
extent map of 6049 bytes for 252 extents (8064 bytes)
Feb 12 13:21:05 test kernel: device-mapper: loop DEBUG: splitting io at 8 
sector boundaries
Feb 12 13:21:05 test kernel: device-mapper: loop DEBUG: constructed loop 
target on /usr/src/5gig.dat (5120000k, 1851392 sectors)
Feb 12 13:21:05 test udevd-event[9296]: run_program: ressize 256 too short
Feb 12 13:21:05 test udevd-event[9296]: run_program: ressize 256 too short
Feb 12 13:21:05 test udevd-event[9305]: run_program: ressize 256 too short
Feb 12 13:21:28 test kernel: device-mapper: loop: no matching extent in map 
for sector 10239992!
Feb 12 13:21:28 test kernel: Buffer I/O error on device dm-2, logical block 
1279999
Feb 12 13:21:28 test kernel: ------------[ cut here ]------------
Feb 12 13:21:28 test kernel: kernel BUG at drivers/md/dm-loop.c:260!
Feb 12 13:21:28 test kernel: invalid opcode: 0000 [#1]
Feb 12 13:21:28 test kernel: SMP
Feb 12 13:21:28 test kernel: Modules linked in: dm_loop
Feb 12 13:21:28 test kernel: CPU:    0
Feb 12 13:21:28 test kernel: EIP:    0060:[<f88d70b9>]    Not tainted VLI
Feb 12 13:21:28 test kernel: EFLAGS: 00010246   (2.6.20 #6)
Feb 12 13:21:28 test kernel: EIP is at contains_sector+0x40/0x48 [dm_loop]
Feb 12 13:21:28 test kernel: eax: 00000000   ebx: 00000000   ecx: 00000000 
edx: 009c3ff8
Feb 12 13:21:28 test kernel: esi: 009c3ff8   edi: 00000000   ebp: f1c46260 
esp: c917fbbc
Feb 12 13:21:28 test kernel: ds: 007b   es: 007b   ss: 0068
Feb 12 13:21:28 test kernel: Process mkfs.reiserfs (pid: 9718, ti=c917e000 
task=e16da550 task.ti=c917e000)
Feb 12 13:21:28 test kernel: Stack: dead83f0 f88d70c1 f1735378 f88d710e 
00001000 00000001 00000000 d9f02680
Feb 12 13:21:28 test kernel:        f8c0f080 009c3ff8 00000000 ef6006c0 
f1c4626c d9f02680 c016e761 f8c0f080
Feb 12 13:21:28 test kernel:        f88d70c1 f1735378 f8c0f080 f88d706b 
f487f9c0 d9f02680 c0399f1a 009c3ff8
Feb 12 13:21:28 test kernel: Call Trace:
Feb 12 13:21:28 test kernel:  [<f88d70c1>] loop_block_map+0x0/0x180 
[dm_loop]
Feb 12 13:21:28 test kernel:  [<f88d710e>] loop_block_map+0x4d/0x180 
[dm_loop]
Feb 12 13:21:28 test kernel:  [<c016e761>] __bio_clone+0x6f/0x8a
Feb 12 13:21:28 test kernel:  [<f88d70c1>] loop_block_map+0x0/0x180 
[dm_loop]
Feb 12 13:21:28 test kernel:  [<f88d706b>] loop_map+0x27/0x35 [dm_loop]
Feb 12 13:21:28 test kernel:  [<c0399f1a>] __map_bio+0x34/0x99
Feb 12 13:21:28 test kernel:  [<c039a783>] __split_bio+0x173/0x41f
Feb 12 13:21:28 test kernel:  [<c014e7fe>] cache_alloc_refill+0x5b/0x488
Feb 12 13:21:28 test kernel:  [<c02179a3>] submit_bio+0xc0/0xc7
Feb 12 13:21:28 test kernel:  [<c039b113>] dm_request+0xc7/0xd4
Feb 12 13:21:28 test kernel:  [<c02159b9>] generic_make_request+0x1ae/0x1be
Feb 12 13:21:28 test kernel:  [<c016ddf8>] bio_put+0x23/0x24
Feb 12 13:21:28 test kernel:  [<c016b0cd>] submit_bh+0xe9/0xf6
Feb 12 13:21:28 test kernel:  [<c016fa14>] blkdev_get_block+0x0/0x43
Feb 12 13:21:28 test kernel:  [<c02179a3>] submit_bio+0xc0/0xc7
Feb 12 13:21:28 test kernel:  [<c01398b5>] mempool_alloc+0x28/0xc6
Feb 12 13:21:28 test kernel:  [<c013d079>] __pagevec_lru_add+0x85/0x90
Feb 12 13:21:28 test kernel:  [<c016dea9>] bio_alloc_bioset+0x9b/0xf3
Feb 12 13:21:28 test kernel:  [<c016b0bc>] submit_bh+0xd8/0xf6
Feb 12 13:21:28 test kernel:  [<c016d686>] block_read_full_page+0x2c8/0x2d9
Feb 12 13:21:28 test kernel:  [<c016fa14>] blkdev_get_block+0x0/0x43
Feb 12 13:21:28 test kernel:  [<c013cb63>] page_cache_readahead+0x12b/0x196
Feb 12 13:21:28 test kernel:  [<c01378bf>] 
do_generic_mapping_read+0x209/0x43b
Feb 12 13:21:28 test kernel:  [<c013971b>] generic_file_aio_read+0x19d/0x1ce
Feb 12 13:21:28 test kernel:  [<c0136fb8>] file_read_actor+0x0/0xd1
Feb 12 13:21:28 test kernel:  [<c0151353>] do_sync_read+0xc7/0x10a
Feb 12 13:21:28 test kernel:  [<c012b2ef>] remove_wait_queue+0xc/0x34
Feb 12 13:21:28 test kernel:  [<c012b199>] autoremove_wake_function+0x0/0x35
Feb 12 13:21:28 test kernel:  [<c026a12a>] tty_ldisc_deref+0x55/0x64
Feb 12 13:21:28 test kernel:  [<c0439389>] mutex_lock+0xb/0x1a
Feb 12 13:21:28 test kernel:  [<c016ed55>] block_llseek+0xad/0xb9
Feb 12 13:21:28 test kernel:  [<c015128c>] do_sync_read+0x0/0x10a
Feb 12 13:21:28 test kernel:  [<c0151b3b>] vfs_read+0x88/0x134
Feb 12 13:21:28 test kernel:  [<c0151f8f>] sys_read+0x41/0x67
Feb 12 13:21:28 test kernel:  [<c0103520>] sysenter_past_esp+0x5d/0x81
Feb 12 13:21:28 test kernel:  [<c0430033>] svc_create_pooled+0x2f/0xf1
Feb 12 13:21:28 test kernel:  =======================
Feb 12 13:21:28 test kernel: Code: 8b 5b 14 01 c1 11 d3 39 df 72 0a 77 04 39 
ce 72 04 31 c0 eb 1a 39 d7 b1 01 72 08 77 04 39 c6 72 02 31 c9 83 f1 01 0f 
b6 c1 eb 04 <0f> 0b eb fe 5b 5e 5f c3 55 89 d1 57 56 53 83 ec 2c 89 44 24 10
Feb 12 13:21:28 test kernel: EIP: [<f88d70b9>] contains_sector+0x40/0x48 
[dm_loop] SS:ESP 0068:c917fbbc
Feb 12 13:21:28 test kernel:  <0>------------[ cut here ]------------
Feb 12 13:21:28 test kernel: kernel BUG at drivers/md/dm-loop.c:260!
Feb 12 13:21:28 test kernel: invalid opcode: 0000 [#2]
Feb 12 13:21:28 test kernel: SMP
Feb 12 13:21:28 test kernel: Modules linked in: dm_loop
Feb 12 13:21:28 test kernel: CPU:    0
Feb 12 13:21:28 test kernel: EIP:    0060:[<f88d70b9>]    Not tainted VLI
Feb 12 13:21:28 test kernel: EFLAGS: 00010246   (2.6.20 #6)
Feb 12 13:21:28 test kernel: EIP is at contains_sector+0x40/0x48 [dm_loop]
Feb 12 13:21:28 test kernel: eax: 00000000   ebx: 00000000   ecx: 00000000 
edx: 00000000
Feb 12 13:21:28 test kernel: esi: 00000000   edi: 00000000   ebp: f1c46260 
esp: c917f6b0
Feb 12 13:21:28 test kernel: ds: 007b   es: 007b   ss: 0068
Feb 12 13:21:28 test kernel: Process mkfs.reiserfs (pid: 9718, ti=c917e000 
task=e16da550 task.ti=c917e000)
Feb 12 13:21:28 test kernel: Stack: dead83f0 f88d70c1 e39f1a88 f88d710e 
00001000 00000001 00000000 df62b880
Feb 12 13:21:28 test kernel:        f8c0f080 00000000 00000000 ef6006c0 
f1c4626c df62b880 c016e761 f8c0f080
Feb 12 13:21:28 test kernel:        f88d70c1 e39f1a88 f8c0f080 f88d706b 
f487f9c0 df62b880 c0399f1a 00000000
Feb 12 13:21:28 test kernel: Call Trace:
Feb 12 13:21:28 test kernel:  [<f88d70c1>] loop_block_map+0x0/0x180 
[dm_loop]
Feb 12 13:21:28 test kernel:  [<f88d710e>] loop_block_map+0x4d/0x180 
[dm_loop]
Feb 12 13:21:28 test kernel:  [<c016e761>] __bio_clone+0x6f/0x8a
Feb 12 13:21:28 test kernel:  [<f88d70c1>] loop_block_map+0x0/0x180 
[dm_loop]
Feb 12 13:21:28 test kernel:  [<f88d706b>] loop_map+0x27/0x35 [dm_loop]
Feb 12 13:21:28 test kernel:  [<c0399f1a>] __map_bio+0x34/0x99
Feb 12 13:21:28 test kernel:  [<c039a783>] __split_bio+0x173/0x41f
Feb 12 13:21:28 test kernel:  [<c0222c22>] memmove+0xe/0x22
Feb 12 13:21:28 test kernel:  [<c02788cb>] vt_console_print+0x20f/0x220
Feb 12 13:21:28 test kernel:  [<c039b113>] dm_request+0xc7/0xd4
Feb 12 13:21:28 test kernel:  [<c02159b9>] generic_make_request+0x1ae/0x1be
Feb 12 13:21:28 test kernel:  [<c02786bc>] vt_console_print+0x0/0x220
Feb 12 13:21:28 test kernel:  [<c011b22b>] __call_console_drivers+0x4f/0x5b
Feb 12 13:21:28 test kernel:  [<c02179a3>] submit_bio+0xc0/0xc7
Feb 12 13:21:28 test kernel:  [<c01398b5>] mempool_alloc+0x28/0xc6
Feb 12 13:21:28 test kernel:  [<c016dea9>] bio_alloc_bioset+0x9b/0xf3
Feb 12 13:21:28 test kernel:  [<c016b0bc>] submit_bh+0xd8/0xf6
Feb 12 13:21:28 test kernel:  [<c016c4a2>] 
__block_write_full_page+0x215/0x308
Feb 12 13:21:28 test kernel:  [<c016fa14>] blkdev_get_block+0x0/0x43
Feb 12 13:21:28 test kernel:  [<c016c802>] block_write_full_page+0xce/0xd6
Feb 12 13:21:28 test kernel:  [<c016fa14>] blkdev_get_block+0x0/0x43
Feb 12 13:21:28 test kernel:  [<c013bc3f>] generic_writepages+0x17d/0x2aa
Feb 12 13:21:28 test kernel:  [<c016eea8>] blkdev_writepage+0x0/0xc
Feb 12 13:21:28 test kernel:  [<c013bac2>] generic_writepages+0x0/0x2aa
Feb 12 13:21:28 test kernel:  [<c013bd8c>] do_writepages+0x20/0x30
Feb 12 13:21:28 test kernel:  [<c0137df7>] 
__filemap_fdatawrite_range+0x65/0x71
Feb 12 13:21:28 test kernel:  [<c0138026>] filemap_fdatawrite+0x23/0x27
Feb 12 13:21:28 test kernel:  [<c013803b>] filemap_write_and_wait+0x11/0x29
Feb 12 13:21:28 test kernel:  [<c016f269>] __blkdev_put+0x3d/0x103
Feb 12 13:21:28 test kernel:  [<c0152281>] __fput+0xa5/0x14d
Feb 12 13:21:28 test kernel:  [<c014fed6>] filp_close+0x51/0x58
Feb 12 13:21:28 test kernel:  [<c011caa7>] put_files_struct+0x5f/0xa2
Feb 12 13:21:28 test kernel:  [<c011da3b>] do_exit+0x1dc/0x69b
Feb 12 13:21:28 test kernel:  [<c011007b>] __modify_IO_APIC_irq+0x4a/0x5a
Feb 12 13:21:28 test kernel:  [<c0104990>] die+0x1f1/0x216
Feb 12 13:21:28 test kernel:  [<c0104d9d>] do_invalid_op+0x0/0xab
Feb 12 13:21:28 test kernel:  [<c0104e3f>] do_invalid_op+0xa2/0xab
Feb 12 13:21:28 test kernel:  [<f88d70b9>] contains_sector+0x40/0x48 
[dm_loop]
Feb 12 13:21:28 test kernel:  [<c011bb3e>] printk+0x1b/0x1f
Feb 12 13:21:28 test kernel:  [<c011bba8>] __printk_ratelimit+0x66/0xa0
Feb 12 13:21:28 test kernel:  [<c016c688>] end_bio_bh_io_sync+0x35/0x39
Feb 12 13:21:28 test kernel:  [<c016dfaa>] bio_endio+0x5b/0x63
Feb 12 13:21:28 test kernel:  [<c043a1ec>] error_code+0x7c/0x84
Feb 12 13:21:28 test kernel:  [<c03900d8>] ps2_handle_response+0xe/0x6f
Feb 12 13:21:28 test kernel:  [<f88d70b9>] contains_sector+0x40/0x48 
[dm_loop]
Feb 12 13:21:28 test kernel:  [<f88d70c1>] loop_block_map+0x0/0x180 
[dm_loop]
Feb 12 13:21:28 test kernel:  [<f88d710e>] loop_block_map+0x4d/0x180 
[dm_loop]
Feb 12 13:21:28 test kernel:  [<c016e761>] __bio_clone+0x6f/0x8a
Feb 12 13:21:28 test kernel:  [<f88d70c1>] loop_block_map+0x0/0x180 
[dm_loop]
Feb 12 13:21:28 test kernel:  [<f88d706b>] loop_map+0x27/0x35 [dm_loop]
Feb 12 13:21:28 test kernel:  [<c0399f1a>] __map_bio+0x34/0x99
Feb 12 13:21:28 test kernel:  [<c039a783>] __split_bio+0x173/0x41f
Feb 12 13:21:28 test kernel:  [<c014e7fe>] cache_alloc_refill+0x5b/0x488
Feb 12 13:21:28 test kernel:  [<c02179a3>] submit_bio+0xc0/0xc7
Feb 12 13:21:28 test kernel:  [<c039b113>] dm_request+0xc7/0xd4
Feb 12 13:21:28 test kernel:  [<c02159b9>] generic_make_request+0x1ae/0x1be
Feb 12 13:21:28 test kernel:  [<c016ddf8>] bio_put+0x23/0x24
Feb 12 13:21:28 test kernel:  [<c016b0cd>] submit_bh+0xe9/0xf6
Feb 12 13:21:28 test kernel:  [<c016fa14>] blkdev_get_block+0x0/0x43
Feb 12 13:21:28 test kernel:  [<c02179a3>] submit_bio+0xc0/0xc7
Feb 12 13:21:28 test kernel:  [<c01398b5>] mempool_alloc+0x28/0xc6
Feb 12 13:21:28 test kernel:  [<c013d079>] __pagevec_lru_add+0x85/0x90
Feb 12 13:21:28 test kernel:  [<c016dea9>] bio_alloc_bioset+0x9b/0xf3
Feb 12 13:21:28 test kernel:  [<c016b0bc>] submit_bh+0xd8/0xf6
Feb 12 13:21:28 test kernel:  [<c016d686>] block_read_full_page+0x2c8/0x2d9
Feb 12 13:21:28 test kernel:  [<c016fa14>] blkdev_get_block+0x0/0x43
Feb 12 13:21:28 test kernel:  [<c013cb63>] page_cache_readahead+0x12b/0x196
Feb 12 13:21:28 test kernel:  [<c01378bf>] 
do_generic_mapping_read+0x209/0x43b
Feb 12 13:21:28 test kernel:  [<c013971b>] generic_file_aio_read+0x19d/0x1ce
Feb 12 13:21:28 test kernel:  [<c0136fb8>] file_read_actor+0x0/0xd1
Feb 12 13:21:28 test kernel:  [<c0151353>] do_sync_read+0xc7/0x10a
Feb 12 13:21:28 test kernel:  [<c012b2ef>] remove_wait_queue+0xc/0x34
Feb 12 13:21:28 test kernel:  [<c012b199>] autoremove_wake_function+0x0/0x35
Feb 12 13:21:28 test kernel:  [<c026a12a>] tty_ldisc_deref+0x55/0x64
Feb 12 13:21:28 test kernel:  [<c0439389>] mutex_lock+0xb/0x1a
Feb 12 13:21:28 test kernel:  [<c016ed55>] block_llseek+0xad/0xb9
Feb 12 13:21:28 test kernel:  [<c015128c>] do_sync_read+0x0/0x10a
Feb 12 13:21:28 test kernel:  [<c0151b3b>] vfs_read+0x88/0x134
Feb 12 13:21:28 test kernel:  [<c0151f8f>] sys_read+0x41/0x67
Feb 12 13:21:28 test kernel:  [<c0103520>] sysenter_past_esp+0x5d/0x81
Feb 12 13:21:28 test kernel:  [<c0430033>] svc_create_pooled+0x2f/0xf1
Feb 12 13:21:28 test kernel:  =======================
Feb 12 13:21:28 test kernel: Code: 8b 5b 14 01 c1 11 d3 39 df 72 0a 77 04 39 
ce 72 04 31 c0 eb 1a 39 d7 b1 01 72 08 77 04 39 c6 72 02 31 c9 83 f1 01 0f 
b6 c1 eb 04 <0f> 0b eb fe 5b 5e 5f c3 55 89 d1 57 56 53 83 ec 2c 89 44 24 10
Feb 12 13:21:28 test kernel: EIP: [<f88d70b9>] contains_sector+0x40/0x48 
[dm_loop] SS:ESP 0068:c917f6b0
Feb 12 13:21:28 test kernel:  <1>Fixing recursive fault but reboot is 
needed!
Feb 12 13:22:02 test kernel: ------------[ cut here ]------------
Feb 12 13:22:02 test kernel: kernel BUG at drivers/md/dm-loop.c:260!
Feb 12 13:22:02 test kernel: invalid opcode: 0000 [#3]
Feb 12 13:22:02 test kernel: SMP
Feb 12 13:22:02 test kernel: Modules linked in: dm_loop
Feb 12 13:22:02 test kernel: CPU:    0
Feb 12 13:22:02 test kernel: EIP:    0060:[<f88d70b9>]    Not tainted VLI
Feb 12 13:22:02 test kernel: EFLAGS: 00010246   (2.6.20 #6)
Feb 12 13:22:02 test kernel: EIP is at contains_sector+0x40/0x48 [dm_loop]
Feb 12 13:22:02 test kernel: eax: 00000000   ebx: 00000000   ecx: 00000000 
edx: 00000008
Feb 12 13:22:02 test kernel: esi: 00000008   edi: 00000000   ebp: f1c46260 
esp: ef9e5be8
Feb 12 13:22:02 test kernel: ds: 007b   es: 007b   ss: 0068
Feb 12 13:22:02 test kernel: Process pdflush (pid: 20365, ti=ef9e4000 
task=f494ca70 task.ti=ef9e4000)
Feb 12 13:22:02 test kernel: Stack: dead83f0 f88d70c1 f3485b78 f88d710e 
00001000 00000001 00000000 c2a4a100
Feb 12 13:22:02 test kernel:        f8c0f080 00000008 00000000 ef6006c0 
f1c4626c c2a4a100 c016e761 f8c0f080
Feb 12 13:22:02 test kernel:        f88d70c1 f3485b78 f8c0f080 f88d706b 
f487f9c0 c2a4a100 c0399f1a 00000008
Feb 12 13:22:02 test kernel: Call Trace:
Feb 12 13:22:02 test kernel:  [<f88d70c1>] loop_block_map+0x0/0x180 
[dm_loop]
Feb 12 13:22:02 test kernel:  [<f88d710e>] loop_block_map+0x4d/0x180 
[dm_loop]
Feb 12 13:22:02 test kernel:  [<c016e761>] __bio_clone+0x6f/0x8a
Feb 12 13:22:02 test kernel:  [<f88d70c1>] loop_block_map+0x0/0x180 
[dm_loop]
Feb 12 13:22:02 test kernel:  [<f88d706b>] loop_map+0x27/0x35 [dm_loop]
Feb 12 13:22:02 test kernel:  [<c0399f1a>] __map_bio+0x34/0x99
Feb 12 13:22:02 test kernel:  [<c039a783>] __split_bio+0x173/0x41f
Feb 12 13:22:02 test kernel:  [<f88d706b>] loop_map+0x27/0x35 [dm_loop]
Feb 12 13:22:02 test kernel:  [<c0399f1a>] __map_bio+0x34/0x99
Feb 12 13:22:02 test kernel:  [<c039b113>] dm_request+0xc7/0xd4
Feb 12 13:22:02 test kernel:  [<c02159b9>] generic_make_request+0x1ae/0x1be
Feb 12 13:22:02 test kernel:  [<c02179a3>] submit_bio+0xc0/0xc7
Feb 12 13:22:02 test kernel:  [<c01398b5>] mempool_alloc+0x28/0xc6
Feb 12 13:22:02 test kernel:  [<c016dea9>] bio_alloc_bioset+0x9b/0xf3
Feb 12 13:22:02 test kernel:  [<c016b0bc>] submit_bh+0xd8/0xf6
Feb 12 13:22:02 test kernel:  [<c016c4a2>] 
__block_write_full_page+0x215/0x308
Feb 12 13:22:02 test kernel:  [<c016fa14>] blkdev_get_block+0x0/0x43
Feb 12 13:22:02 test kernel:  [<c016c802>] block_write_full_page+0xce/0xd6
Feb 12 13:22:02 test kernel:  [<c016fa14>] blkdev_get_block+0x0/0x43
Feb 12 13:22:02 test kernel:  [<c013bc3f>] generic_writepages+0x17d/0x2aa
Feb 12 13:22:02 test kernel:  [<c016eea8>] blkdev_writepage+0x0/0xc
Feb 12 13:22:02 test kernel:  [<c013bac2>] generic_writepages+0x0/0x2aa
Feb 12 13:22:02 test kernel:  [<c013bd8c>] do_writepages+0x20/0x30
Feb 12 13:22:02 test kernel:  [<c0167e97>] 
__writeback_single_inode+0x1ae/0x31e
Feb 12 13:22:02 test kernel:  [<c01682e3>] sync_sb_inodes+0x169/0x212
Feb 12 13:22:02 test kernel:  [<c01686c3>] writeback_inodes+0x63/0xa4
Feb 12 13:22:02 test kernel:  [<c013c1f0>] wb_kupdate+0x94/0xf8
Feb 12 13:22:02 test kernel:  [<c013c4aa>] pdflush+0x0/0x19d
Feb 12 13:22:02 test kernel:  [<c013c5b3>] pdflush+0x109/0x19d
Feb 12 13:22:02 test kernel:  [<c013c15c>] wb_kupdate+0x0/0xf8
Feb 12 13:22:02 test kernel:  [<c012b0d0>] kthread+0xb0/0xd8
Feb 12 13:22:02 test kernel:  [<c012b020>] kthread+0x0/0xd8
Feb 12 13:22:02 test kernel:  [<c010416f>] kernel_thread_helper+0x7/0x10
Feb 12 13:22:02 test kernel:  =======================
Feb 12 13:22:02 test kernel: Code: 8b 5b 14 01 c1 11 d3 39 df 72 0a 77 04 39 
ce 72 04 31 c0 eb 1a 39 d7 b1 01 72 08 77 04 39 c6 72 02 31 c9 83 f1 01 0f 
b6 c1 eb 04 <0f> 0b eb fe 5b 5e 5f c3 55 89 d1 57 56 53 83 ec 2c 89 44 24 10
Feb 12 13:22:02 test kernel: EIP: [<f88d70b9>] contains_sector+0x40/0x48 
[dm_loop] SS:ESP 0068:ef9e5be8
Feb 12 13:24:48 test kernel:  <0>------------[ cut here ]------------
Feb 12 13:24:48 test kernel: kernel BUG at drivers/md/dm-loop.c:260!
Feb 12 13:24:48 test kernel: invalid opcode: 0000 [#4]
Feb 12 13:24:48 test kernel: SMP
Feb 12 13:24:48 test kernel: Modules linked in: dm_loop
--snipp---
(i can provide more of the log on request)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2.6.20] updated dm-loop patch
  2007-02-12  8:49 [PATCH 2.6.20] updated dm-loop patch roland
@ 2007-02-12 15:24 ` Bryn M. Reeves
  0 siblings, 0 replies; 9+ messages in thread
From: Bryn M. Reeves @ 2007-02-12 15:24 UTC (permalink / raw)
  To: roland; +Cc: dm-devel

[-- Attachment #1: Type: text/plain, Size: 2588 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

roland wrote:
> Hi Bryn,
> 
> after some first tests which looked very very promising (could dmlosetup
> files >2gb, could create more than 256 loop dm-loop devices...), i have
> bad news.
> maybe it`s not that "bad" because you may be able to fix it quickly, but
> it seems that dm-loop is racy (or whatever). maybe smp safeness, since i
> was testing on a P4 with HT enabled ? i did my first tests on non-smp
> system (VM), but it wasn`t put under that load as i did now.

Hi Roland,

At first sight, this doesn't look SMP related. The backtrace you posted
comes from a BUG() macro in the source that triggers when we can't find
an extent we're looking for in the table.

There's also a rather odd number in the "not using" message:

device-mapper: loop: not using 4294967296 bytes in incomplete block at EOF

So I think for some reason we're not building a correct block map for
this file.

I'll have time to take a proper look at this this evening - while I'm
looking into this one, do you have any other info on the file that gave
the problem:

- - was it a sparse file?
- - was an offset used when creating the device?

If you have the time, I'd also be interested in seeing the following
information:

- - output of dmsetup table <device>
- - debugfs output for the loop file concerned

For the first one, there's no need to perform any I/O to the device
after creating it, so you shouldn't need to trigger the BUG() again - it
might help to kill udevd though, as it will try to run vol_id etc. on
the device otherwise.

For the debugfs data, please run the attached script on the device/file
that had the problem, for e.g.:

do_debugfs.sh /dev/hda3 src/5gig.dat > /tmp/5gig.dat.out

The first argument is the device containing the file and the second is
the path relative to the device's root directory - change the
path/device to suit your system.

This will give a complete block map for the file so I can see what we're
tripping over. For a 5G file this may take a few minutes and the file
will be 50-100k in size - you can send it to me privately rather than
spamming the whole list :)

> ps:
> hey, why not announcing this on lkml, so this gut get some more notice
> or being reviewed by others?

So that we can work these kind of problems out first! ;)

Kind regards,

Bryn.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFF0Iai6YSQoMYUY94RAiNmAKC99G8+slz+tVDnSkCiVsPN6GVoJwCeJcCg
wQ3iLG/ZVzGsualWfcmVv34=
=y8/d
-----END PGP SIGNATURE-----

[-- Attachment #2: do_debugfs.sh --]
[-- Type: application/x-shellscript, Size: 286 bytes --]

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2.6.20] updated dm-loop patch
  2007-02-15 12:28 ` Bryn M. Reeves
@ 2007-02-21  0:14   ` roland
  0 siblings, 0 replies; 9+ messages in thread
From: roland @ 2007-02-21  0:14 UTC (permalink / raw)
  To: Bryn M. Reeves; +Cc: device-mapper development

Hi !

Just wanted to let you know that i'm running dm-loop for some days now, 
being very lucky with it and not having any problems !  :)

Good work, Bryn!

regards
Roland

ps:
There`s only one issue being left:
It`s not in mainline ;)



----- Original Message ----- 
From: "Bryn M. Reeves" <breeves@redhat.com>
To: <devzero@web.de>
Cc: "device-mapper development" <dm-devel@redhat.com>
Sent: Thursday, February 15, 2007 1:28 PM
Subject: Re: [dm-devel] [PATCH 2.6.20] updated dm-loop patch


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> devzero@web.de wrote:
>> Hi Bryn,
>>
>> with this patch and also with 0.415 i have the following problem when 
>> compiling it with stock 2.6.20 :
>>
>>   Building modules, stage 2.
>>   MODPOST 1 modules
>> WARNING: "invalidate_mapping_pages" [drivers/md/dm-loop.ko] undefined!
>> make[1]: *** [__modpost] Error 1
>> make: *** [modules] Error 2
>>
>> i found , that this was due to missing
>>
>> EXPORT_SYMBOL(invalidate_mapping_pages);
>>
>> in mm/truncate.c
>>
>> i found
>> http://lkml.org/lkml/2007/1/3/154
>>
>> it looks that this didn`t go into 2.6.20 and we need at least 
>> 2.6.20-git11 ?
>> changelog at 
>> http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.20-git11.log 
>> telling that it had just had been merged on 10th of february:
>>
>
> Hi Roland,
>
> That's right - I keep my git tree fairly close to upstream, so as soon
> as the patch that deprecated invalidate_inode_pages was merged, I
> converted dm-loop to use invalidate_mapping_pages instead.
>
> This means that dm-loop will build without warnings on the latest
> kernel.org tree but it does mean that you will need the additional patch
> you referenced if you want to apply it to a plain 2.6.20 kernel.
>
> Kind regards,
>
> Bryn.
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
>
> iD8DBQFF1FHa6YSQoMYUY94RAl2iAKDAlrTqxDCNP7i/bhexl6JJGW1rNwCgksXz
> 5limiJOmDRoBKdHDUsU0pFE=
> =HwdH
> -----END PGP SIGNATURE----- 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2.6.20] updated dm-loop patch
@ 2007-02-15 22:30 devzero
  0 siblings, 0 replies; 9+ messages in thread
From: devzero @ 2007-02-15 22:30 UTC (permalink / raw)
  To: Bryn M. Reeves; +Cc: device-mapper development

hi again, 

just wondering - would it make sense to have dm-loop supporting partitions "out of the box" ?

i gave it a try, but the appropriate ioctls seem to be missing.

Command (m for help): p

Disk /dev/mapper/loop0: 104 MB, 104857600 bytes
255 heads, 63 sectors/track, 12 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

             Device Boot      Start         End      Blocks   Id  System
/dev/mapper/loop0p1               1           6       48163+  83  Linux
/dev/mapper/loop0p2               7          12       48195   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 22: Invalid argument.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.


fdisk providing wrong information here, because removing and re-adding loop0 doesn`t make loop0p1 or loop0p2 appear.

there is some nice script at http://www.ussg.iu.edu/hypermail/linux/kernel/0307.2/0935.html which makes those partitions available by creating additional devices, but i`m just wondering if this could work automa(t|g)ically.

regards
roland




> -----Ursprüngliche Nachricht-----
> Von: "Bryn M. Reeves" <breeves@redhat.com>
> Gesendet: 15.02.07 13:28:24
> An: devzero@web.de
> CC: device-mapper development <dm-devel@redhat.com>
> Betreff: Re: [dm-devel] [PATCH 2.6.20] updated dm-loop patch


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> devzero@web.de wrote:
> > Hi Bryn, 
> > 
> > with this patch and also with 0.415 i have the following problem when compiling it with stock 2.6.20 :
> > 
> >   Building modules, stage 2.
> >   MODPOST 1 modules
> > WARNING: "invalidate_mapping_pages" [drivers/md/dm-loop.ko] undefined!
> > make[1]: *** [__modpost] Error 1
> > make: *** [modules] Error 2
> > 
> > i found , that this was due to missing 
> > 
> > EXPORT_SYMBOL(invalidate_mapping_pages);
> > 
> > in mm/truncate.c 
> > 
> > i found 
> > http://lkml.org/lkml/2007/1/3/154
> > 
> > it looks that this didn`t go into 2.6.20 and we need at least 2.6.20-git11 ?
> > changelog at http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.20-git11.log telling that it had just had been merged on 10th of february:
> > 
> 
> Hi Roland,
> 
> That's right - I keep my git tree fairly close to upstream, so as soon
> as the patch that deprecated invalidate_inode_pages was merged, I
> converted dm-loop to use invalidate_mapping_pages instead.
> 
> This means that dm-loop will build without warnings on the latest
> kernel.org tree but it does mean that you will need the additional patch
> you referenced if you want to apply it to a plain 2.6.20 kernel.
> 
> Kind regards,
> 
> Bryn.
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
> 
> iD8DBQFF1FHa6YSQoMYUY94RAl2iAKDAlrTqxDCNP7i/bhexl6JJGW1rNwCgksXz
> 5limiJOmDRoBKdHDUsU0pFE=
> =HwdH
> -----END PGP SIGNATURE-----
> 


_______________________________________________________________________
Viren-Scan für Ihren PC! Jetzt für jeden. Sofort, online und kostenlos.
Gleich testen! http://www.pc-sicherheit.web.de/freescan/?mc=022222

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2.6.20] updated dm-loop patch
  2007-02-15 11:45 devzero
@ 2007-02-15 12:28 ` Bryn M. Reeves
  2007-02-21  0:14   ` roland
  0 siblings, 1 reply; 9+ messages in thread
From: Bryn M. Reeves @ 2007-02-15 12:28 UTC (permalink / raw)
  To: devzero; +Cc: device-mapper development

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

devzero@web.de wrote:
> Hi Bryn, 
> 
> with this patch and also with 0.415 i have the following problem when compiling it with stock 2.6.20 :
> 
>   Building modules, stage 2.
>   MODPOST 1 modules
> WARNING: "invalidate_mapping_pages" [drivers/md/dm-loop.ko] undefined!
> make[1]: *** [__modpost] Error 1
> make: *** [modules] Error 2
> 
> i found , that this was due to missing 
> 
> EXPORT_SYMBOL(invalidate_mapping_pages);
> 
> in mm/truncate.c 
> 
> i found 
> http://lkml.org/lkml/2007/1/3/154
> 
> it looks that this didn`t go into 2.6.20 and we need at least 2.6.20-git11 ?
> changelog at http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.20-git11.log telling that it had just had been merged on 10th of february:
> 

Hi Roland,

That's right - I keep my git tree fairly close to upstream, so as soon
as the patch that deprecated invalidate_inode_pages was merged, I
converted dm-loop to use invalidate_mapping_pages instead.

This means that dm-loop will build without warnings on the latest
kernel.org tree but it does mean that you will need the additional patch
you referenced if you want to apply it to a plain 2.6.20 kernel.

Kind regards,

Bryn.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFF1FHa6YSQoMYUY94RAl2iAKDAlrTqxDCNP7i/bhexl6JJGW1rNwCgksXz
5limiJOmDRoBKdHDUsU0pFE=
=HwdH
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2.6.20] updated dm-loop patch
@ 2007-02-15 11:45 devzero
  2007-02-15 12:28 ` Bryn M. Reeves
  0 siblings, 1 reply; 9+ messages in thread
From: devzero @ 2007-02-15 11:45 UTC (permalink / raw)
  To: Bryn M. Reeves, device-mapper development

Hi Bryn, 

with this patch and also with 0.415 i have the following problem when compiling it with stock 2.6.20 :

  Building modules, stage 2.
  MODPOST 1 modules
WARNING: "invalidate_mapping_pages" [drivers/md/dm-loop.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

i found , that this was due to missing 

EXPORT_SYMBOL(invalidate_mapping_pages);

in mm/truncate.c 

i found 
http://lkml.org/lkml/2007/1/3/154

it looks that this didn`t go into 2.6.20 and we need at least 2.6.20-git11 ?
changelog at http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.20-git11.log telling that it had just had been merged on 10th of february:


commit 54bc485522afdac33de5504da2ea8cdcc690674e
Author: Anton Altaparmakov <aia21@cam.ac.uk>
Date:   Sat Feb 10 01:45:38 2007 -0800

    [PATCH] Export invalidate_mapping_pages() to modules
    
    It makes no sense to me to export invalidate_inode_pages() and not
    invalidate_mapping_pages() and I actually need invalidate_mapping_pages()
    because of its range specification ability...
    
    akpm: also remove the export of invalidate_inode_pages() by making it an
    inlined wrapper.
    

so, we need 2.6.20-git11 for dm-loop or we would need to modify mm/truncate.c manually !?

regards
roland




> -----Ursprüngliche Nachricht-----
> Von: "Bryn M. Reeves" <breeves@redhat.com>
> Gesendet: 15.02.07 01:52:12
> An: device-mapper development <dm-devel@redhat.com>
> Betreff: Re: [dm-devel] [PATCH 2.6.20] updated dm-loop patch


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Bryn M. Reeves wrote:
> > This version of the patch fixes a couple of problems that Roland found
> > with file offsets & the use of some conversion routines from dm.h:
> 
> Unfortunately, it also added a new bug: in backing out some other
> changes I'd accidentally reverted to a version of the patch with some
> experimental changes to the file I/O workqueue. This was incomplete and
> harms performance for file mapped loop devices.
> 
> The attached version changes this back to the previous per-loop device
> workqueue.
> 
> Apologies for the confusion.
> 
> Kind regards,
> 
> Bryn.
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
> 
> iD8DBQFF066R6YSQoMYUY94RArhtAJ9J0Cc5o+Hg3NjzX8iikrIli9UlYgCggLaE
> /hJdDhTkiybsmfxz8SdLVko=
> =4pK0
> -----END PGP SIGNATURE-----
> 
> 


_____________________________________________________________________
Der WEB.DE SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
http://smartsurfer.web.de/?mc=100071&distributionid=000000000066

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2.6.20] updated dm-loop patch
  2007-02-13 20:20 ` Bryn M. Reeves
@ 2007-02-15  0:51   ` Bryn M. Reeves
  0 siblings, 0 replies; 9+ messages in thread
From: Bryn M. Reeves @ 2007-02-15  0:51 UTC (permalink / raw)
  To: device-mapper development; +Cc: roland

[-- Attachment #1: Type: text/plain, Size: 855 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bryn M. Reeves wrote:
> This version of the patch fixes a couple of problems that Roland found
> with file offsets & the use of some conversion routines from dm.h:

Unfortunately, it also added a new bug: in backing out some other
changes I'd accidentally reverted to a version of the patch with some
experimental changes to the file I/O workqueue. This was incomplete and
harms performance for file mapped loop devices.

The attached version changes this back to the previous per-loop device
workqueue.

Apologies for the confusion.

Kind regards,

Bryn.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFF066R6YSQoMYUY94RArhtAJ9J0Cc5o+Hg3NjzX8iikrIli9UlYgCggLaE
/hJdDhTkiybsmfxz8SdLVko=
=4pK0
-----END PGP SIGNATURE-----

[-- Attachment #2: dm-loop.patch --]
[-- Type: text/x-patch, Size: 24821 bytes --]

This implements a loopback target for device mapper allowing a regular
file to be treated as a block device.

Signed-off-by: Bryn Reeves <breeves@redhat.com>

===================================================================
diff --git a/drivers/md/dm-loop.c b/drivers/md/dm-loop.c
new file mode 100644
index 0000000..1c00812
--- /dev/null
+++ b/drivers/md/dm-loop.c
@@ -0,0 +1,1030 @@
+/*
+ * Copyright (C) 2006 Red Hat, Inc. All rights reserved.
+ *
+ * This file is part of device-mapper.
+ *
+ * drivers/md/dm-loop.c
+ *
+ * Extent mapping implementation heavily influenced by mm/swapfile.c
+ * Bryn Reeves <breeves@redhat.com>
+ *
+ * File mapping and block lookup algorithms support by
+ * Heinz Mauelshagen <hjm@redhat.com>.
+ *
+ * This file is released under the GPL.
+ *
+ */
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/syscalls.h>
+#include <linux/workqueue.h>
+#include <linux/file.h>
+#include <linux/bio.h>
+
+#include "dm.h"
+#include "dm-bio-list.h"
+
+static const char *version = "v0.416";
+#define DAEMON "kloopd"
+#define DM_MSG_PREFIX "loop"
+
+enum flags { BLOCK_TYPE, FILE_TYPE, VMALLOC };
+
+/*--------------------------------------------------------------------
+ * Loop context
+ *--------------------------------------------------------------------*/
+
+struct loop_c {
+	unsigned long flags;
+
+	/* information describing the backing store */
+	struct file *filp;		/* loop file handle */
+	char *path;			/* path argument */
+	loff_t offset;			/* offset argument */
+	struct block_device *bdev;	/* block device */
+	unsigned blkbits;		/* file system block size shift bits */
+
+	loff_t size;			/* size of entire file in bytes */
+	loff_t blocks;			/* blocks allocated to loop file */
+	sector_t mapped_sectors;	/* size of mapped area in sectors*/
+
+	/* mapping */
+	int (*map_fn)(struct dm_target*, struct bio*);
+	/* mapping function private data */
+	void *map_data;
+};
+
+/*
+ * block map extents
+ */
+struct extent {
+	sector_t start;
+	sector_t to;
+	sector_t len;
+};
+
+struct extent_list {
+	struct extent * extent;
+	struct list_head list;
+};
+
+struct kmem_cache *extent_cache;
+
+/*
+ * block map private context
+ */
+struct block_map_c {
+	int nr_extents;			/* number of extents in map */
+	struct extent **map;		/* linear map of extent pointers */
+	struct extent **mru;		/* pointer to meu entry */
+	spinlock_t mru_lock;		/* protects mru */
+};
+
+/*
+ * file map private context
+ */
+struct file_map_c {
+	spinlock_t lock;		/* protects in */
+	struct bio_list in;		/* new bios for processing */
+	struct bio_list work;		/* bios queued for processing */
+	struct workqueue_struct *wq;	/* workqueue */
+	struct work_struct ws;		/* loop work */
+	struct loop_c *loop;		/* for filp & offset */
+};
+
+/*--------------------------------------------------------------------
+ * Generic helper routines
+ *--------------------------------------------------------------------*/
+
+static inline sector_t blk2sec(struct loop_c *lc, blkcnt_t block)
+{
+	return block << (lc->blkbits - SECTOR_SHIFT);
+}
+
+static inline blkcnt_t sec2blk(struct loop_c *lc, sector_t sector)
+{
+	return sector >> (lc->blkbits - SECTOR_SHIFT);
+}
+
+/*--------------------------------------------------------------------
+ * File I/O helper routines
+ *--------------------------------------------------------------------*/
+
+/*
+ * transfer data to/from file.
+ */
+static int fs_io(int rw, struct file *filp, loff_t *pos,
+		     struct bio_vec *bv)
+{
+	ssize_t r;
+	void *ptr = kmap(bv->bv_page) + bv->bv_offset;
+	mm_segment_t old_fs = get_fs();
+
+	set_fs(get_ds());
+	r = (rw == READ) ? filp->f_op->read(filp, ptr, bv->bv_len, pos) :
+			   filp->f_op->write(filp, ptr, bv->bv_len, pos);
+	set_fs(old_fs);
+	kunmap(bv->bv_page);
+	return r == bv->bv_len ? 0 : -EIO;
+}
+
+/*
+ * Handle IO for one bio
+ */
+static void do_one_bio(struct file_map_c *fc, struct bio *bio)
+{
+	int r = 0, rw = bio_data_dir(bio);
+	loff_t start = (bio->bi_sector << 9) + fc->loop->offset,
+		pos = start;
+	struct bio_vec *bv, *bv_end = bio->bi_io_vec + bio->bi_vcnt;
+
+	for (bv = bio->bi_io_vec; bv < bv_end; bv++) {
+		r = fs_io(rw, fc->loop->filp, &pos, bv);
+		if (r) {
+			DMWARN("%s error %d", rw ? "write":"read" , r);
+			break;
+		}
+	}
+
+	bio_endio(bio, pos - start, r);
+}
+
+/*
+ * Worker thread for a 'file' type loop device
+ */
+static void do_loop_work(struct work_struct *ws)
+{
+	struct file_map_c *fc = container_of(ws, struct file_map_c, ws);
+	struct bio *bio;
+
+	/* quickly grab all new ios queued and add them to the work list */
+	spin_lock_irq(&fc->lock);
+	bio_list_merge_init(&fc->work, &fc->in);
+	spin_unlock_irq(&fc->lock);
+
+	/* work the list and do file I/O on all bios */
+	while ((bio = bio_list_pop(&fc->work)))
+		do_one_bio(fc, bio);
+}
+
+/*
+ * Create work queue and initialize work
+ */
+static int loop_work_init(struct loop_c *lc)
+{
+	struct file_map_c *fc = lc->map_data;
+	fc->wq = create_singlethread_workqueue(DAEMON);
+	if (!fc->wq)
+		return -ENOMEM;
+
+	return 0;
+}
+
+/*
+ * Destroy work queue
+ */
+static void loop_work_exit(struct loop_c *lc)
+{
+	struct file_map_c *fc = lc->map_data;
+	if (fc->wq)
+		destroy_workqueue(fc->wq);
+}
+
+/*
+ * FILE_TYPE map_fn. Mapping just queues ios to the file map
+ * context and lets the daemon deal with them.
+ */
+static int loop_file_map(struct dm_target *ti, struct bio *bio)
+{
+	int wake;
+	struct loop_c *lc = ti->private;
+	struct file_map_c *fc = lc->map_data;
+
+	spin_lock_irq(&fc->lock);
+	wake = bio_list_empty(&fc->in);
+	bio_list_add(&fc->in, bio);
+	spin_unlock_irq(&fc->lock);
+
+	/*
+	 * only call queue_work() if necessary to avoid
+	 * superfluous preempt_{disable/enable}() overhead.
+	 */
+	if (wake)
+		queue_work(fc->wq, &fc->ws);
+
+	/* handling bio -> will submit later */
+	return 0;
+}
+
+static void destroy_file_map(struct loop_c *lc)
+{
+	loop_work_exit(lc);
+	kfree(lc->map_data);
+}
+
+/*
+ * Set up a file map context and workqueue
+ */
+static int setup_file_map(struct loop_c *lc)
+{
+	struct file_map_c *fc = kzalloc(sizeof(*fc), GFP_KERNEL);
+	if (!fc)
+		return -ENOMEM;
+	lc->map_data = fc;
+	spin_lock_init(&fc->lock);
+	bio_list_init(&fc->in);
+	bio_list_init(&fc->work);
+	INIT_WORK(&fc->ws, do_loop_work);
+	fc->loop = lc;
+
+	lc->map_fn = loop_file_map;
+	return loop_work_init(lc);
+}
+
+/*--------------------------------------------------------------------
+ * Block I/O helper routines
+ *--------------------------------------------------------------------*/
+
+static int contains_sector(struct extent *e, sector_t s)
+{
+	if (likely(e))
+		return s < (e->start + (e->len)) &&
+			s >= e->start;
+	BUG();
+	return 0;
+}
+
+/*
+ * Return an extent range (i.e. beginning+ending physical block numbers).
+ */
+static int extent_range(struct inode * inode,
+			blkcnt_t logical_blk, blkcnt_t last_blk,
+			blkcnt_t *begin_blk, blkcnt_t *end_blk)
+{
+	sector_t dist = 0, phys_blk, probe_blk = logical_blk;
+
+	/* Find beginning physical block of extent starting at logical_blk. */
+	*begin_blk = phys_blk = bmap(inode, probe_blk);
+	if (!phys_blk)
+		return -ENXIO;
+
+	for (; phys_blk == *begin_blk + dist; dist++) {
+		*end_blk = phys_blk;
+		if (++probe_blk > last_blk)
+			break;
+
+		phys_blk = bmap(inode, probe_blk);
+		if (unlikely(!phys_blk))
+			return -ENXIO;
+	}
+
+	return 0;
+
+}
+
+/*
+ * Walk over a linked list of extent_list structures, freeing them as
+ * we go. Does not free el->extent.
+ */
+static void destroy_extent_list(struct list_head *head)
+{
+	struct list_head *curr, *n;
+
+	if (list_empty(head))
+		return;
+
+	list_for_each_safe(curr, n, head) {
+		struct extent_list *el;
+		el = list_entry(curr, struct extent_list, list);
+		list_del(curr);
+		kfree(el);
+	}
+}
+
+/*
+ * Add a new extent to the tail of the list at *head with
+ * start/to/len parameters. Allocates from the extent cache.
+ */
+static int list_add_extent(struct list_head *head,
+		sector_t start, sector_t to, sector_t len)
+{
+	struct extent *extent;
+	struct extent_list *list;
+
+	if (!(extent = kmem_cache_alloc(extent_cache, GFP_KERNEL)))
+		goto out;
+
+	if (!(list = kmalloc(sizeof(*list), GFP_KERNEL)))
+		goto out;
+
+	extent->start = start;
+	extent->to = to;
+	extent->len = len;
+
+	list->extent = extent;
+
+	list_add_tail(&list->list, head);
+
+	return 0;
+out:
+	if (extent)
+		kmem_cache_free(extent_cache, extent);
+	return -ENOMEM;
+}
+
+/*
+ * Create a sequential list of extents from an inode and return
+ * it in *head. On success the number of extents found is returned,
+ * or -ERRNO on error
+ */
+static int loop_extents(struct loop_c *lc, struct inode *inode,
+			struct list_head *head)
+{
+	sector_t start = 0;
+	int r, nr_extents = 0;
+	blkcnt_t nr_blks = 0, begin_blk = 0, end_blk = 0;
+	blkcnt_t last_blk = sec2blk(lc,
+			(lc->mapped_sectors + (lc->offset >> 9))) - 1;
+
+	blkcnt_t logical_blk = sec2blk(lc, (lc->offset >> 9));
+
+	while (logical_blk <= last_blk) {
+		r = extent_range(inode, logical_blk, last_blk,
+				&begin_blk, &end_blk);
+		if (unlikely(r)) {
+			DMERR("%s has a hole; sparse file detected - "
+				"switching to filesystem I/O", lc->path);
+			clear_bit(BLOCK_TYPE, &lc->flags);
+			set_bit(FILE_TYPE, &lc->flags);
+			return r;
+		}
+
+		nr_blks = 1 + end_blk - begin_blk;
+
+		if (likely(nr_blks)) {
+			r = list_add_extent(head, start,
+				blk2sec(lc, begin_blk),
+				blk2sec(lc, nr_blks));
+
+			if (unlikely(r))
+				return r;
+
+			nr_extents++;
+			start += blk2sec(lc, nr_blks);
+			begin_blk += nr_blks;
+			logical_blk += nr_blks;
+		}
+	}
+
+	return nr_extents;
+}
+
+/*
+ * Walk over the extents in a block_map_c, returning them to the cache and
+ * freeing bc via kfree or vfree as appropriate.
+ */
+static void destroy_block_map(struct block_map_c *bc, int v)
+{
+	int i;
+
+	if (!bc)
+		return;
+
+	for (i = 0; i < bc->nr_extents ; i++) {
+		kmem_cache_free(extent_cache, bc->map[i]);
+	}
+	DMDEBUG("%cfreeing block map of %d entries", (v)?'v':'k', i);
+	if (v)
+		vfree(bc->map);
+	else
+		kfree(bc->map);
+	kfree(bc);
+}
+
+/*
+ * Find an extent in *bc using binary search. Returns a pointer into the
+ * map of linear extent pointers. Calculate index as (extent - bc->map).
+ */
+static struct extent **extent_binary_lookup(struct block_map_c *bc,
+					   struct extent **extent_mru,
+					   sector_t sector)
+{
+	unsigned nr_extents = bc->nr_extents;
+	unsigned delta, dist, prev_dist = 0;
+	struct extent **eptr;
+
+	/* Optimize lookup range based on MRU extent. */
+	dist = extent_mru - bc->map;
+	if ((*extent_mru)->start < sector) {
+		delta = (nr_extents - dist) / 2;
+		dist += delta;
+	} else
+		delta = dist = dist / 2;
+
+	eptr = bc->map + dist;
+	while(*eptr && !contains_sector(*eptr, sector)) {
+		if (sector >= (*eptr)->start + (*eptr)->len) {
+			prev_dist = dist;
+			if (delta > 1)
+				delta /= 2;
+
+			dist += delta;
+		} else {
+			delta = (dist - prev_dist) / 2;
+			if (!delta)
+				delta = 1;
+
+			dist -= delta;
+		}
+		eptr = bc->map + dist;
+	}
+	return eptr;
+}
+
+/*
+ * Lookup an extent for a sector using the mru cache and binary search.
+ */
+static struct extent *extent_lookup(struct block_map_c *bc, sector_t sector)
+{
+	struct extent **eptr;
+
+	/* copy the mru so we can drop the lock quickly */
+	spin_lock_irq(&bc->mru_lock);
+	eptr = bc->mru;
+	spin_unlock_irq(&bc->mru_lock);
+
+	if (contains_sector(*eptr, sector))
+		return *eptr;
+
+	eptr = extent_binary_lookup(bc, eptr, sector);
+	if (!eptr)
+		return NULL;
+
+	spin_lock_irq(&bc->mru_lock);
+	bc->mru = eptr;
+	spin_unlock_irq(&bc->mru_lock);
+	return *eptr;
+}
+
+/*
+ * BLOCK_TYPE map_fn. Looks up the sector in the extent map and
+ * rewrites the bio device and bi_sector fields.
+ */
+static int loop_block_map(struct dm_target *ti, struct bio *bio)
+{
+	struct loop_c *lc = ti->private;
+	struct extent *extent = extent_lookup(lc->map_data, bio->bi_sector);
+
+	if (likely(extent)) {
+		bio->bi_bdev = lc->bdev;
+		bio->bi_sector = extent->to +
+				 (bio->bi_sector - extent->start);
+		return 1;       /* Done with bio -> submit */
+	}
+
+	DMERR("no matching extent in map for sector %llu!",
+	      (unsigned long long) bio->bi_sector + ti->begin);
+	BUG();
+	return -EIO;
+
+}
+
+/*
+ * Turn an extent_list into a linear pointer map of nr_extents + 1 entries
+ * and set the final entry to NULL.
+ */
+static struct extent **build_extent_map(struct list_head *head,
+				int nr_extents, unsigned long *flags)
+{
+	unsigned map_size, cache_size;
+	struct extent **map, **curr;
+	struct list_head *pos;
+
+	map_size = 1 +(sizeof(*map) * nr_extents);
+	cache_size = kmem_cache_size(extent_cache) * nr_extents;
+
+	/* FIXME: arbitrary limit (arch sensitive?)*/
+	if (map_size > (4 * PAGE_SIZE)) {
+		set_bit(VMALLOC, flags);
+		DMDEBUG("using vmalloc for extent map");
+		map = vmalloc(map_size);
+	} else
+		map = kmalloc(map_size, GFP_KERNEL);
+	if (!map)
+		return ERR_PTR(-ENOMEM);
+
+	curr = map;
+
+	DMDEBUG("allocated extent map of %u %s for %d extents (%u %s)",
+		(map_size < 8192 ) ? map_size : map_size >> 10,
+		(map_size < 8192 ) ? "bytes" : "kilobytes", nr_extents,
+		(cache_size < 8192) ? cache_size : cache_size >> 10,
+		(cache_size < 8192) ? "bytes" : "kilobytes");
+
+	list_for_each(pos, head) {
+		struct extent_list *el;
+		el = list_entry(pos, struct extent_list, list);
+		*(curr++) = el->extent;
+	}
+	*curr = NULL;
+	return map;
+}
+
+/*
+ * Set up a block map context and extent map
+ */
+static int setup_block_map(struct loop_c *lc, struct inode *inode)
+{
+	int r, nr_extents;
+	struct block_map_c *bc;
+	LIST_HEAD(head);
+
+	if (!inode || !inode->i_sb || !inode->i_sb->s_bdev) {
+		return -ENXIO;
+	}
+
+	/* build a linked list of extents in linear order */
+	r = loop_extents(lc, inode, &head);
+
+	if (r<0)
+		goto out;
+
+	nr_extents = r;
+	r = -ENOMEM;
+
+	if (!(bc = kzalloc(sizeof(*bc), GFP_KERNEL)))
+		goto out;
+
+	/* create a linear map of pointers into the extent cache */
+	bc->map = build_extent_map(&head, nr_extents, &lc->flags);
+
+	if (IS_ERR(bc->map)) {
+		r = PTR_ERR(bc->map);
+		goto out;
+	}
+
+	destroy_extent_list(&head);
+
+	spin_lock_init(&bc->mru_lock);
+	bc->mru = bc->map;
+	bc->nr_extents = nr_extents;
+
+	lc->bdev = inode->i_sb->s_bdev;
+	lc->map_data = bc;
+	lc->map_fn = loop_block_map ;
+	return 0;
+
+out:
+	destroy_extent_list(&head);
+	return r;
+}
+
+/*--------------------------------------------------------------------
+ * Generic helper routines
+ *--------------------------------------------------------------------*/
+
+/*
+ * Invalidate all unlocked loop file pages
+ */
+static int loop_invalidate_file(struct file *filp)
+{
+	return  invalidate_mapping_pages(filp->f_mapping, 0, ~0UL);
+}
+
+/*
+ * acquire or release a "no-truncate" lock on *filp.
+ * We overload the S_SWAPFILE flag for loop targets because
+ * it provides the same no-truncate semantics we require, and
+ * holding onto i_sem is no longer an option.
+ */
+static void file_truncate_lock(struct file *filp)
+{
+	struct inode *inode = filp->f_mapping->host;
+
+	mutex_lock(&inode->i_mutex);
+	inode->i_flags |= S_SWAPFILE;
+	mutex_unlock(&inode->i_mutex);
+}
+
+static void file_truncate_unlock(struct file *filp)
+{
+	struct inode *inode = filp->f_mapping->host;
+
+	mutex_lock(&inode->i_mutex);
+	inode->i_flags &= ~S_SWAPFILE;
+	mutex_unlock(&inode->i_mutex);
+}
+
+/*
+ * Fill out split_io for taget backing store
+ */
+static void set_split_io(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+	if(test_bit(BLOCK_TYPE,&lc->flags))
+		/* Split I/O at block boundaries */
+		ti->split_io = 1 << (lc->blkbits - SECTOR_SHIFT);
+	else
+		ti->split_io = 64;
+	DMDEBUG("splitting io at %llu sector boundaries",
+			(unsigned long long) ti->split_io);
+}
+
+/*
+ * Check that the loop file is regular and available.
+ */
+static int loop_check_file(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+	struct file *filp = lc->filp;
+	struct inode *inode = filp->f_mapping->host;
+
+	if (!inode) {
+		return -ENXIO;
+	}
+
+	if (!S_ISREG(inode->i_mode)) {
+		DMERR("%s is not a regular file", lc->path);
+		return -EINVAL;
+	}
+
+	if (mapping_writably_mapped(filp->f_mapping)) {
+		DMERR("%s is mapped into userspace for writing", lc->path);
+		return -EBUSY;
+	}
+
+	if (mapping_mapped(filp->f_mapping))
+		DMWARN("%s is mapped into userspace", lc->path);
+
+	if (!inode->i_sb || !inode->i_sb->s_bdev) {
+		DMWARN("%s has no blockdevice - switching to filesystem I/O", lc->path);
+		clear_bit(BLOCK_TYPE, &lc->flags);
+		set_bit(FILE_TYPE, &lc->flags);
+	}
+
+	if (IS_SWAPFILE(inode)) {
+		DMERR("%s is already in use", lc->path);
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+/*
+ * Check loop file size and store it in the loop context
+ */
+static int loop_setup_size(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+	struct inode *inode = lc->filp->f_mapping->host;
+	int r = -EINVAL;
+
+	lc->size = i_size_read(inode);
+	lc->blkbits = inode->i_blkbits;
+
+	if (!lc->size){
+		ti->error = "backing file is empty";
+		goto out;
+	}
+
+	DMDEBUG("set loop backing file size to %llu",
+		(unsigned long long) lc->size);
+
+	if (lc->size < (blk2sec(lc,1) << 9)) {
+		ti->error = "backing file cannot be less than one block in size";
+		goto out;
+	}
+
+	if (lc->offset & ((1 << lc->blkbits) - 1)) {
+		ti->error = "loop file offset must be a multiple of fs blocksize";
+		goto out;
+	}
+	if (lc->offset > (lc->size - (1 << 9))) {
+		ti->error = "loop file offset too large";
+		goto out;
+	}
+
+	lc->mapped_sectors = (lc->size - lc->offset) >> 9;
+
+	DMDEBUG("set mapped sectors to %llu (%llu bytes)",
+		(unsigned long long) lc->mapped_sectors,
+		(lc->size - lc->offset));
+
+	if ( (lc->offset + (lc->mapped_sectors << 9)) < lc->size)
+		DMWARN("not using %llu bytes in incomplete block at EOF",
+		       lc->size - (lc->offset + (lc->mapped_sectors << 9)));
+
+	if (lc->size - lc->offset < (ti->len << 9)) {
+		ti->error = "mapped region cannot be smaller than target size";
+		goto out;
+	}
+
+	return 0;
+out:
+	return r;
+}
+
+/*
+ * release a loop file
+ */
+static void loop_put_file(struct file *filp)
+{
+	if (!filp)
+		return;
+
+	file_truncate_unlock(filp);
+	filp_close(filp, NULL);
+}
+
+/*
+ * open loop file and perform type, availability and size checks.
+ */
+static int loop_get_file(struct dm_target *ti)
+{
+	int flags = ((dm_table_get_mode(ti->table) & FMODE_WRITE) ?
+		    O_RDWR : O_RDONLY) | O_LARGEFILE;
+	struct loop_c *lc = ti->private;
+	struct file *filp;
+	int r = 0;
+
+	ti->error = "could not open loop backing file";
+
+	filp = filp_open(lc->path, flags, 0);
+
+	if (IS_ERR(filp))
+		return PTR_ERR(filp);
+
+	lc->filp = filp;
+
+	r = loop_check_file(ti);
+	if (r)
+		goto out_put;
+
+	r = loop_setup_size(ti);
+	if (r)
+		goto out_put;
+
+	file_truncate_lock(filp);
+	return 0;
+
+out_put:
+	fput(filp);
+	return r;
+
+}
+
+/*
+ * invalidate mapped pages belonging to the loop file
+ */
+void loop_flush(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+
+	loop_invalidate_file(lc->filp);
+}
+
+/*--------------------------------------------------------------------
+ * Device-mapper target methods
+ *--------------------------------------------------------------------*/
+/*
+ * Generic loop map function. Re-base I/O to target begin and submit
+ * mapping request to ((struct loop_c *)ti->private)->map_fn.
+ */
+static int loop_map(struct dm_target *ti, struct bio *bio,
+					union map_info *context)
+{
+	struct loop_c *lc = ti->private;
+
+	if (unlikely(bio_barrier(bio)))
+		return -EOPNOTSUPP;
+	/* rebase bio to target begin */
+	bio->bi_sector -= ti->begin;
+	if (lc->map_fn)
+		return lc->map_fn(ti, bio);
+	BUG();
+	return -EIO;
+}
+
+/*
+ * Block status helper.
+ */
+static ssize_t loop_file_status(struct loop_c *lc, char *result, unsigned maxlen)
+{
+	ssize_t sz = 0;
+	struct file_map_c *fc = lc->map_data;
+	int qlen;
+
+	spin_lock_irq(&fc->lock);
+	qlen = bio_list_nr(&fc->work);
+	qlen += bio_list_nr(&fc->in);
+	spin_unlock_irq(&fc->lock);
+	DMEMIT("%d", qlen);
+	return sz;
+}
+
+/*
+ * File status helper.
+ */
+static ssize_t loop_block_status(struct loop_c *lc, char *result, unsigned maxlen)
+{
+	ssize_t sz = 0;
+	struct block_map_c *bc = lc->map_data;
+	int mru;
+	spin_lock_irq(&bc->mru_lock);
+	mru = bc->mru - bc->map;
+	spin_unlock_irq(&bc->mru_lock);
+	DMEMIT("%d %d", bc->nr_extents, mru);
+	return sz;
+}
+
+/*
+ * This needs some thought on handling unlinked backing files. some parts of
+ * the kernel return a cached name (now invalid), while others return a dcache
+ * "/path/to/foo (deleted)" name (never was/is valid). Which is better is
+ * debatable.
+ *
+ * On the one hand, using a cached name gives table output which is directly
+ * usable assuming the user re-creates the unlinked image file, on the other
+ * it is more consistent with e.g. swap to use the dcache name.
+*/
+static int loop_status(struct dm_target *ti, status_type_t type,
+				char *result, unsigned maxlen)
+{
+	struct loop_c *lc = (struct loop_c *) ti->private;
+	ssize_t sz = 0;
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		if (test_bit(BLOCK_TYPE, &lc->flags))
+			sz += loop_block_status(lc, result, maxlen - sz);
+		else if (test_bit(FILE_TYPE, &lc->flags))
+			sz += loop_file_status(lc, result, maxlen - sz);
+		break;
+
+	case STATUSTYPE_TABLE:
+		DMEMIT("%s %llu", lc->path, lc->offset);
+		break;
+	}
+	return 0;
+}
+
+/*
+ * Destroy a loopback mapping
+ */
+static void loop_dtr(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+
+
+	if ((dm_table_get_mode(ti->table) & FMODE_WRITE))
+		loop_invalidate_file(lc->filp);
+
+	if (test_bit(BLOCK_TYPE, &lc->flags) && lc->map_data)
+		destroy_block_map((struct block_map_c *)lc->map_data,
+				test_bit(VMALLOC, &lc->flags));
+	if (test_bit(FILE_TYPE, &lc->flags) && lc->map_data)
+		destroy_file_map(lc);
+
+	loop_put_file(lc->filp);
+	DMINFO("released file %s", lc->path);
+
+
+	kfree(lc);
+}
+
+/*
+ * Construct a loopback mapping: <path> <offset>
+ */
+static int loop_ctr(struct dm_target *ti, unsigned argc, char **argv)
+{
+	struct loop_c *lc = NULL;
+	int r = -EINVAL;
+
+	if (argc != 2) {
+		ti->error = "invalid argument count";
+		goto out;
+	}
+
+	r = -ENOMEM;
+	lc = kzalloc(sizeof(*lc), GFP_KERNEL);
+	if (!lc) {
+		ti->error = "cannot allocate loop context";
+		goto out;
+	}
+	lc->path = kstrdup(argv[0], GFP_KERNEL);
+	if (!lc->path) {
+		ti->error = "cannot allocate loop path", -ENOMEM;
+		goto out;
+	}
+
+	ti->private = lc;
+
+	r = -EINVAL;
+	if (sscanf(argv[1], "%lld", &lc->offset) != 1) {
+		ti->error = "invalid file offset";
+		goto out;
+	}
+
+	if (lc->offset)
+		DMDEBUG("setting file offset to %lld", lc->offset);
+
+	/* defaults */
+	set_bit(BLOCK_TYPE, &lc->flags);
+	/* open & check file and set size parameters */
+	r = loop_get_file(ti);
+	if (r) {
+		/* ti->error has been set by loop_get_file */
+		goto out;
+	}
+
+	if (test_bit(BLOCK_TYPE, &lc->flags))
+		r = setup_block_map(lc, lc->filp->f_mapping->host);
+	if (test_bit(FILE_TYPE, &lc->flags))
+		r = setup_file_map(lc);
+	set_split_io(ti);
+
+	if (r) {
+		ti->error = "could not create extent map";
+		goto out_putf;
+	}
+
+	if (lc->bdev)
+		dm_set_device_limits(ti, lc->bdev);
+
+	DMDEBUG("constructed loop target on %s "
+		"(%lldk, %llu sectors)", lc->path,
+		(lc->size >> 10), lc->mapped_sectors);
+
+	return 0;
+
+out_putf:
+	loop_put_file(lc->filp);
+out:
+	if(lc)
+		kfree(lc);
+	return r;
+}
+
+static struct target_type loop_target = {
+	.name = "loop",
+	.version = {0, 0, 1},
+	.module = THIS_MODULE,
+	.ctr = loop_ctr,
+	.dtr = loop_dtr,
+	.map = loop_map,
+	.presuspend = loop_flush,
+	.flush = loop_flush,
+	.status = loop_status,
+};
+
+/*--------------------------------------------------------------------
+ * Module bits
+ *--------------------------------------------------------------------*/
+int __init dm_loop_init(void)
+{
+	int r;
+
+	r = dm_register_target(&loop_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto out;
+	}
+
+	r = -ENOMEM;
+
+	extent_cache = kmem_cache_create("extent_cache", sizeof(struct extent),
+					0, SLAB_HWCACHE_ALIGN, NULL, NULL);
+	if (!extent_cache)
+		goto out;
+
+	DMINFO("registered %s", version);
+	return 0;
+
+out:
+	if (extent_cache)
+		kmem_cache_destroy(extent_cache);
+	return r;
+}
+
+void dm_loop_exit(void)
+{
+	int r;
+
+	r = dm_unregister_target(&loop_target);
+	kmem_cache_destroy(extent_cache);
+
+	if (r < 0)
+		DMERR("target unregister failed %d", r);
+	else
+		DMINFO("unregistered %s", version);
+}
+
+module_init(dm_loop_init);
+module_exit(dm_loop_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Bryn Reeves <breeves@redhat.com>");
+MODULE_DESCRIPTION("device-mapper loop target");

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2.6.20] updated dm-loop patch
  2007-02-06 11:35 Bryn M. Reeves
@ 2007-02-13 20:20 ` Bryn M. Reeves
  2007-02-15  0:51   ` Bryn M. Reeves
  0 siblings, 1 reply; 9+ messages in thread
From: Bryn M. Reeves @ 2007-02-13 20:20 UTC (permalink / raw)
  To: device-mapper development; +Cc: roland

[-- Attachment #1: Type: text/plain, Size: 777 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This version of the patch fixes a couple of problems that Roland found
with file offsets & the use of some conversion routines from dm.h:

replace to_sector/to_bytes as they truncate 64 bit args/return vals
fixed the checks & mapped area calculations for files with offset!=0
fix error reporting in loop_get_file

There are also a couple of minor cleanups:

remove unused arg from extent_range
remove use of deprecated invalidate_inode_pages
comment fixes

Kind regards,

Bryn.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFF0h2o6YSQoMYUY94RArvUAKCrxbWNhvcERujCDIMpd+7FMCEWOACglfTW
VDAfOwzYs685/1ecDHcHu8E=
=wpKo
-----END PGP SIGNATURE-----

[-- Attachment #2: dm-loop.patch --]
[-- Type: text/x-patch, Size: 24868 bytes --]

This implements a loopback target for device mapper allowing a regular
file to be treated as a block device.

Signed-off-by: Bryn Reeves <breeves@redhat.com>

===================================================================
diff --git a/drivers/md/dm-loop.c b/drivers/md/dm-loop.c
new file mode 100644
index 0000000..aa9dc05
--- /dev/null
+++ b/drivers/md/dm-loop.c
@@ -0,0 +1,1040 @@
+/*
+ * Copyright (C) 2006 Red Hat, Inc. All rights reserved.
+ *
+ * This file is part of device-mapper.
+ *
+ * drivers/md/dm-loop.c
+ *
+ * Extent mapping implementation heavily influenced by mm/swapfile.c
+ * Bryn Reeves <breeves@redhat.com>
+ *
+ * File mapping and block lookup algorithms support by
+ * Heinz Mauelshagen <hjm@redhat.com>.
+ * 
+ * This file is released under the GPL.
+ *
+ */
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/mm.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/pagemap.h>
+#include <linux/vmalloc.h>
+#include <linux/syscalls.h>
+#include <linux/workqueue.h>
+#include <linux/file.h>
+#include <linux/bio.h>
+
+#include "dm.h"
+#include "dm-bio-list.h"
+#include "dm-bio-record.h"
+
+static const char *version = "v0.415";
+#define DAEMON "kloopd"
+#define DM_MSG_PREFIX "loop"
+
+struct workqueue_struct *kloopd;	/* workqueue */
+
+enum flags { BLOCK_TYPE, FILE_TYPE, VMALLOC };
+
+/*--------------------------------------------------------------------
+ * Loop context
+ *--------------------------------------------------------------------*/
+
+struct loop_c {
+	unsigned long flags;
+
+	/* information describing the backing store */
+	struct file *filp;		/* loop file handle */
+	char *path;			/* path argument */
+	loff_t offset;			/* offset argument */
+	struct block_device *bdev;	/* block device */
+	unsigned blkbits;		/* file system block size shift bits */
+
+	loff_t size;			/* size of entire file in bytes */
+	loff_t blocks;			/* blocks allocated to loop file */
+	sector_t mapped_sectors;	/* size of mapped area in sectors*/
+
+	/* mapping */
+	int (*map_fn)(struct dm_target*, struct bio*);
+	/* mapping function private data */
+	void *map_data;
+};
+
+/*
+ * block map extents
+ */
+struct extent {
+	sector_t start;
+	sector_t to;
+	sector_t len;
+};
+
+struct extent_list {
+	struct extent * extent;
+	struct list_head list;
+};
+
+struct kmem_cache *extent_cache;
+
+/* 
+ * block map private context
+ */
+struct block_map_c {
+	int nr_extents;			/* number of extents in map */
+	struct extent **map;		/* linear map of extent pointers */
+	struct extent **mru;		/* pointer to meu entry */
+	spinlock_t mru_lock;		/* protects mru */
+};
+
+/* 
+ * file map private context
+ */
+struct file_map_c {
+	spinlock_t lock;		/* protects in */
+	struct bio_list in;		/* new bios for processing */
+	struct bio_list work;		/* bios queued for processing */
+	struct work_struct ws;		/* loop work */
+	struct loop_c *loop;		/* for filp & offset */
+};
+
+/*--------------------------------------------------------------------
+ * Generic helper routines						    
+ *--------------------------------------------------------------------*/
+
+static inline sector_t blk2sec(struct loop_c *lc, blkcnt_t block)
+{
+	return block << (lc->blkbits - SECTOR_SHIFT);
+}
+
+static inline blkcnt_t sec2blk(struct loop_c *lc, sector_t sector)
+{
+	return sector >> (lc->blkbits - SECTOR_SHIFT);
+}
+
+/*--------------------------------------------------------------------
+ * File I/O helper routines						   
+ *--------------------------------------------------------------------*/
+
+/*
+ * transfer data to/from file. 
+ */
+static int fs_io(int rw, struct file *filp, loff_t *pos,
+		     struct bio_vec *bv)
+{
+	ssize_t r;
+	void *ptr = kmap(bv->bv_page) + bv->bv_offset;
+	mm_segment_t old_fs = get_fs();
+
+	set_fs(get_ds());
+	r = (rw == READ) ? filp->f_op->read(filp, ptr, bv->bv_len, pos) :
+			   filp->f_op->write(filp, ptr, bv->bv_len, pos);
+	set_fs(old_fs);
+	kunmap(bv->bv_page);
+	return r == bv->bv_len ? 0 : -EIO;
+}
+
+/*
+ * Handle IO for one bio
+ */
+static void do_one_bio(struct file_map_c *fc, struct bio *bio)
+{
+	int r = 0, rw = bio_data_dir(bio);
+	loff_t start = (bio->bi_sector << 9) + fc->loop->offset,
+		pos = start;
+	struct bio_vec *bv, *bv_end = bio->bi_io_vec + bio->bi_vcnt;
+
+	for (bv = bio->bi_io_vec; bv < bv_end; bv++) {
+		r = fs_io(rw, fc->loop->filp, &pos, bv);
+		if (r) {
+			DMWARN("%s error %d", rw ? "write":"read" , r);
+			break;
+		}
+	}
+
+	bio_endio(bio, pos - start, r);
+}
+
+/*
+ * Worker thread for a 'file' type loop device
+ */
+static void do_loop_work(struct work_struct *ws)
+{
+	struct file_map_c *fc = container_of(ws, struct file_map_c, ws);
+	struct bio *bio;
+
+	/* quickly grab all new ios queued and add them to the work list */
+	spin_lock_irq(&fc->lock);
+	bio_list_merge_init(&fc->work, &fc->in);
+	spin_unlock_irq(&fc->lock);
+
+	/* work the list and do file IO on all bios */
+	while ((bio = bio_list_pop(&fc->work)))
+		do_one_bio(fc, bio);
+}
+
+/*
+ * Create work queue and initialize work
+ */
+static int loop_work_init(void)
+{
+
+	kloopd = create_singlethread_workqueue(DAEMON);
+	if (!kloopd)
+		return -ENOMEM;
+
+	return 0;
+}
+
+/*
+ * Destroy work queue
+ */
+static void loop_work_exit(void)
+{
+	if (kloopd)
+		destroy_workqueue(kloopd);
+}
+
+/*
+ * FILE_TYPE map_fn. Mapping just queues ios to the file map
+ * context and lets the daemon deal with them.
+ */
+static int loop_file_map(struct dm_target *ti, struct bio *bio)
+{
+	int wake;
+	struct loop_c *lc = ti->private;
+	struct file_map_c *fc = lc->map_data;
+
+	spin_lock_irq(&fc->lock);
+	wake = bio_list_empty(&fc->in);
+	bio_list_add(&fc->in, bio);
+	spin_unlock_irq(&fc->lock);
+
+	/*
+	 * only call queue_work() if necessary to avoid
+	 * superfluous preempt_{disable/enable}() overhead.
+	 */
+	if (wake)
+		queue_work(kloopd, &fc->ws);
+
+	/* handling bio -> will submit later */
+	return 0;
+}
+
+static void destroy_file_map(struct loop_c *lc)
+{
+	kfree(lc->map_data);
+}
+
+/* 
+ * Set up a file map context and workqueue
+ */
+static int setup_file_map(struct loop_c *lc)
+{
+	struct file_map_c *fc = kzalloc(sizeof(*fc), GFP_KERNEL);
+	if (!fc)
+		return -ENOMEM;
+	lc->map_data = fc;
+	spin_lock_init(&fc->lock);
+	bio_list_init(&fc->in);
+	bio_list_init(&fc->work);
+	INIT_WORK(&fc->ws, do_loop_work);
+	fc->loop = lc;
+
+	lc->map_fn = loop_file_map;
+	return 0;
+}
+
+/*--------------------------------------------------------------------
+ * Block I/O helper routines
+ *--------------------------------------------------------------------*/
+
+static int contains_sector(struct extent *e, sector_t s)
+{
+	if (likely(e))
+		return s < (e->start + (e->len)) &&
+			s >= e->start;
+	BUG();
+	return 0;
+}
+
+/*
+ * Return an extent range (i.e. beginning+ending physical block numbers). 
+ */
+static int extent_range(struct inode * inode,
+			blkcnt_t logical_blk, blkcnt_t last_blk,
+			blkcnt_t *begin_blk, blkcnt_t *end_blk)
+{
+	sector_t dist = 0, phys_blk, probe_blk = logical_blk;
+
+	/* Find beginning physical block of extent starting at logical_blk. */
+	*begin_blk = phys_blk = bmap(inode, probe_blk);
+	if (!phys_blk)
+		return -ENXIO;
+
+	for (; phys_blk == *begin_blk + dist; dist++) {
+		*end_blk = phys_blk;
+		if (++probe_blk > last_blk)
+			break;
+
+		phys_blk = bmap(inode, probe_blk);
+		if (unlikely(!phys_blk))
+			return -ENXIO;
+	}
+
+	return 0;
+
+}
+
+/*
+ * Walk over a linked list of extent_list structures, freeing them as
+ * we go. Does not free el->extent.
+ */
+static void destroy_extent_list(struct list_head *head)
+{
+	struct list_head *curr, *n;
+
+	if (list_empty(head))
+		return;
+
+	list_for_each_safe(curr, n, head) {
+		struct extent_list *el;
+		el = list_entry(curr, struct extent_list, list);
+		list_del(curr);
+		kfree(el);
+	}
+}
+
+/*
+ * Add a new extent to the tail of the list at *head with 
+ * start/to/len parameters. Allocates from the extent cache.
+ */
+static int list_add_extent(struct list_head *head, 
+		sector_t start, sector_t to, sector_t len)
+{
+	struct extent *extent;
+	struct extent_list *list;
+	
+	if (!(extent = kmem_cache_alloc(extent_cache, GFP_KERNEL)))
+		goto out;
+	
+	if (!(list = kmalloc(sizeof(*list), GFP_KERNEL)))
+		goto out;
+
+	extent->start = start;
+	extent->to = to;
+	extent->len = len;
+
+	list->extent = extent;
+
+	list_add_tail(&list->list, head);
+
+	return 0;
+out:
+	if (extent)
+		kmem_cache_free(extent_cache, extent);
+	return -ENOMEM;
+}
+
+/* 
+ * Create a sequential list of extents from an inode and return 
+ * it in *head. On success the number of extents found is returned,
+ * or -ERRNO on error 
+ */
+static int loop_extents(struct loop_c *lc, struct inode *inode, 
+			struct list_head *head)
+{
+	sector_t start = 0;
+	int r, nr_extents = 0;
+	blkcnt_t nr_blks = 0, begin_blk = 0, end_blk = 0;
+	blkcnt_t last_blk = sec2blk(lc, 
+			(lc->mapped_sectors + (lc->offset >> 9))) - 1;
+
+	blkcnt_t logical_blk = sec2blk(lc, (lc->offset >> 9));
+
+	while (logical_blk <= last_blk) {
+		r = extent_range(inode, logical_blk, last_blk,
+				&begin_blk, &end_blk);
+		if (unlikely(r)) {
+			DMERR("%s has a hole; sparse file detected - "
+				"switching to filesystem I/O", lc->path);
+			clear_bit(BLOCK_TYPE, &lc->flags);
+			set_bit(FILE_TYPE, &lc->flags);
+			return r;
+		}
+
+		nr_blks = 1 + end_blk - begin_blk;
+
+		if (likely(nr_blks)) {
+			r = list_add_extent(head, start,
+				blk2sec(lc, begin_blk), 
+				blk2sec(lc, nr_blks));
+
+			if (unlikely(r))
+				return r;
+
+			nr_extents++;
+			start += blk2sec(lc, nr_blks);
+			begin_blk += nr_blks;
+			logical_blk += nr_blks;
+		}
+	}
+
+	return nr_extents;
+}
+
+/*
+ * Walk over the extents in a block_map_c, returning them to the cache and
+ * freeing bc via kfree or vfree as appropriate.
+ */
+static void destroy_block_map(struct block_map_c *bc, int v)
+{
+	int i;
+
+	if (!bc)
+		return;
+
+	for (i = 0; i < bc->nr_extents ; i++) {
+		kmem_cache_free(extent_cache, bc->map[i]);
+	}
+	DMDEBUG("%cfreeing block map of %d entries", (v)?'v':'k', i);
+	if (v)
+		vfree(bc->map);
+	else
+		kfree(bc->map);
+	kfree(bc);
+}
+
+/*
+ * Find an extent in *bc using binary search. Returns a pointer into the
+ * map of linear extent pointers. Calculate index as (extent - bc->map).
+ */
+static struct extent **extent_binary_lookup(struct block_map_c *bc,
+					   struct extent **extent_mru,
+					   sector_t sector)
+{
+	unsigned nr_extents = bc->nr_extents;
+	unsigned delta, dist, prev_dist = 0;
+	struct extent **eptr;
+
+	/* Optimize lookup range based on MRU extent. */
+	dist = extent_mru - bc->map;
+	if ((*extent_mru)->start < sector) {
+		delta = (nr_extents - dist) / 2;
+		dist += delta;
+	} else
+		delta = dist = dist / 2;
+
+	eptr = bc->map + dist;
+	while(*eptr && !contains_sector(*eptr, sector)) {
+		if (sector >= (*eptr)->start + (*eptr)->len) {
+			prev_dist = dist;
+			if (delta > 1)
+				delta /= 2;
+
+			dist += delta;
+		} else {
+			delta = (dist - prev_dist) / 2;
+			if (!delta)
+				delta = 1;
+
+			dist -= delta;
+		}
+		eptr = bc->map + dist;
+	}
+	return eptr;
+}
+
+/*
+ * Lookup an extent for a sector using the mru cache and binary search.
+ */
+static struct extent *extent_lookup(struct block_map_c *bc, sector_t sector)
+{
+	struct extent **eptr;
+
+	spin_lock_irq(&bc->mru_lock);
+	eptr = bc->mru;
+	spin_unlock_irq(&bc->mru_lock);
+
+	if (contains_sector(*eptr, sector))
+		return *eptr;
+
+	eptr = extent_binary_lookup(bc, eptr, sector);
+	if (!eptr)
+		return NULL;
+
+	spin_lock_irq(&bc->mru_lock);
+	bc->mru = eptr;
+	spin_unlock_irq(&bc->mru_lock);
+	return *eptr;
+}
+
+/*
+ * BLOCK_TYPE map_fn. Looks up the sector in the extent map and 
+ * rewrites the bio device and bi_sector fields.
+ */
+static int loop_block_map(struct dm_target *ti, struct bio *bio)
+{
+	struct loop_c *lc = ti->private;
+	struct extent *extent = extent_lookup(lc->map_data, bio->bi_sector);
+
+	if (likely(extent)) {
+		bio->bi_bdev = lc->bdev;
+		bio->bi_sector = extent->to +
+				 (bio->bi_sector - extent->start);
+		return 1;       /* Done with bio -> submit */
+	}
+
+	DMERR("no matching extent in map for sector %llu!",
+	      (unsigned long long) bio->bi_sector + ti->begin);
+	BUG();
+	return -EIO;
+
+}
+
+/*
+ * Turn an extent_list into a linear pointer map of nr_extents + 1 entries
+ * and set the final entry to NULL.
+ */
+static struct extent **build_extent_map(struct list_head *head, 
+				int nr_extents, unsigned long *flags)
+{
+	unsigned map_size, cache_size;
+	struct extent **map, **curr;
+	struct list_head *pos;
+
+	map_size = 1 +(sizeof(*map) * nr_extents);
+	cache_size = kmem_cache_size(extent_cache) * nr_extents;
+
+	/* FIXME: arbitrary limit (arch sensitive?)*/
+	if (map_size > (4 * PAGE_SIZE)) {
+		set_bit(VMALLOC, flags);
+		DMDEBUG("using vmalloc for extent map");
+		map = vmalloc(map_size);
+	} else
+		map = kmalloc(map_size, GFP_KERNEL);
+	if (!map)
+		return ERR_PTR(-ENOMEM);
+
+	curr = map;
+
+	DMDEBUG("allocated extent map of %u %s for %d extents (%u %s)", 
+		(map_size < 8192 ) ? map_size : map_size >> 10,	
+		(map_size < 8192 ) ? "bytes" : "kilobytes", nr_extents,
+		(cache_size < 8192) ? cache_size : cache_size >> 10,
+		(cache_size < 8192) ? "bytes" : "kilobytes");
+
+	list_for_each(pos, head) {
+		struct extent_list *el;
+		el = list_entry(pos, struct extent_list, list);
+		*(curr++) = el->extent;
+	}
+	*curr = NULL;
+	return map;
+}
+
+/* 
+ * Set up a block map context and extent map 
+ */
+static int setup_block_map(struct loop_c *lc, struct inode *inode)
+{
+	int r, nr_extents;
+	struct block_map_c *bc;
+	LIST_HEAD(head);
+
+	if (!inode || !inode->i_sb || !inode->i_sb->s_bdev) {
+		return -ENXIO;
+	}
+
+	/* build a linked list of extents in linear order */
+	r = loop_extents(lc, inode, &head);
+
+	if (r<0)
+		goto out;
+
+	nr_extents = r;
+	r = -ENOMEM;
+	
+	if (!(bc = kzalloc(sizeof(*bc), GFP_KERNEL)))
+		goto out;
+
+	/* create a linear map of pointers into the extent cache */
+	bc->map = build_extent_map(&head, nr_extents, &lc->flags);
+
+	if (IS_ERR(bc->map)) {
+		r = PTR_ERR(bc->map);
+		goto out;
+	}
+
+	destroy_extent_list(&head);
+
+	spin_lock_init(&bc->mru_lock);
+	bc->mru = bc->map;
+	bc->nr_extents = nr_extents;
+
+	lc->bdev = inode->i_sb->s_bdev;
+	lc->map_data = bc;
+	lc->map_fn = loop_block_map ;
+	return 0;
+
+out:
+	destroy_extent_list(&head);
+	return r;
+}
+
+/*--------------------------------------------------------------------
+ * Generic helper routines
+ *--------------------------------------------------------------------*/
+
+/*
+ * Invalidate all unlocked loop file pages
+ */
+static int loop_invalidate_file(struct file *filp)
+{
+	return  invalidate_mapping_pages(filp->f_mapping, 0, ~0UL);
+}
+
+/*
+ * acquire or release a "no-truncate" lock on *filp.
+ * We overload the S_SWAPFILE flag for loop targets because
+ * it provides the same no-truncate semantics we require, and
+ * holding onto i_sem is no longer an option.
+ */
+static void file_truncate_lock(struct file *filp)
+{
+	struct inode *inode = filp->f_mapping->host;
+
+	mutex_lock(&inode->i_mutex);
+	inode->i_flags |= S_SWAPFILE;
+	mutex_unlock(&inode->i_mutex);
+}
+
+static void file_truncate_unlock(struct file *filp)
+{
+	struct inode *inode = filp->f_mapping->host;
+
+	mutex_lock(&inode->i_mutex);
+	inode->i_flags &= ~S_SWAPFILE;
+	mutex_unlock(&inode->i_mutex);
+}
+
+/*
+ * Fill out split_io for taget backing store
+ */
+static void set_split_io(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+	if(test_bit(BLOCK_TYPE,&lc->flags))
+		/* Split I/O at block boundaries */
+		ti->split_io = 1 << (lc->blkbits - SECTOR_SHIFT);
+	else
+		ti->split_io = 64;
+	DMDEBUG("splitting io at %llu sector boundaries", 
+			(unsigned long long) ti->split_io);
+}
+
+/* 
+ * Check that the loop file is regular and available.
+ */
+static int loop_check_file(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+	struct file *filp = lc->filp;
+	struct inode *inode = filp->f_mapping->host;
+
+	if (!inode) {
+		return -ENXIO;
+	}
+
+	if (!S_ISREG(inode->i_mode)) {
+		DMERR("%s is not a regular file", lc->path);
+		return -EINVAL;
+	}
+
+	if (mapping_writably_mapped(filp->f_mapping)) {
+		DMERR("%s is mapped into userspace for writing", lc->path);
+		return -EBUSY;
+	}
+
+	if (mapping_mapped(filp->f_mapping))
+		DMWARN("%s is mapped into userspace", lc->path);
+
+	if (!inode->i_sb || !inode->i_sb->s_bdev) {
+		DMWARN("%s has no blockdevice - switching to filesystem I/O", lc->path);
+		clear_bit(BLOCK_TYPE, &lc->flags);
+		set_bit(FILE_TYPE, &lc->flags);
+	}
+
+	if (IS_SWAPFILE(inode)) {
+		DMERR("%s is already in use", lc->path);
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+/*
+ * Check loop file size and store it in the loop context
+ */
+static int loop_setup_size(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+	struct inode *inode = lc->filp->f_mapping->host;
+	int r = -EINVAL;
+
+	lc->size = i_size_read(inode);
+	lc->blkbits = inode->i_blkbits;
+
+	if (!lc->size){
+		ti->error = "backing file is empty";
+		goto out;
+	}
+
+	DMDEBUG("set loop backing file size to %llu",
+		(unsigned long long) lc->size);
+
+	if (lc->size < (blk2sec(lc,1) << 9)) {
+		ti->error = "backing file cannot be less than one block in size";
+		goto out;
+	}
+
+	if (lc->offset & ((1 << lc->blkbits) - 1)) {
+		ti->error = "loop file offset must be a multiple of fs blocksize";
+		goto out;
+	} 
+	if (lc->offset > (lc->size - (1 << 9))) {
+		ti->error = "loop file offset too large";
+		goto out;
+	}
+
+	lc->mapped_sectors = (lc->size - lc->offset) >> 9;
+
+	DMDEBUG("set mapped sectors to %llu (%llu bytes)",
+		(unsigned long long) lc->mapped_sectors,
+		(lc->size - lc->offset));
+
+	if ( (lc->offset + (lc->mapped_sectors << 9)) < lc->size)
+		DMWARN("not using %llu bytes in incomplete block at EOF",
+		       lc->size - (lc->offset + (lc->mapped_sectors << 9)));
+
+	if (lc->size - lc->offset < (ti->len << 9)) {
+		ti->error = "mapped region cannot be smaller than target size";
+		goto out;
+	}
+
+	return 0;
+out:
+	return r;
+}
+
+/*
+ * release a loop file
+ */
+static void loop_put_file(struct file *filp)
+{
+	if (!filp)
+		return;
+
+	file_truncate_unlock(filp);
+	filp_close(filp, NULL);
+}
+
+/*
+ * open loop file and perform type, availability and size checks.
+ */
+static int loop_get_file(struct dm_target *ti)
+{
+	int flags = ((dm_table_get_mode(ti->table) & FMODE_WRITE) ?
+		    O_RDWR : O_RDONLY) | O_LARGEFILE;
+	struct loop_c *lc = ti->private;
+	struct file *filp;
+	int r = 0;
+
+	ti->error = "could not open loop backing file";
+
+	filp = filp_open(lc->path, flags, 0);
+	
+	if (IS_ERR(filp))
+		return PTR_ERR(filp);
+
+	lc->filp = filp;
+
+	r = loop_check_file(ti);
+	if (r)
+		goto out_put;
+
+	r = loop_setup_size(ti);
+	if (r)
+		goto out_put;
+
+	file_truncate_lock(filp);
+	return 0;
+
+out_put:
+	fput(filp);
+	return r;
+	
+}
+
+/*
+ * invalidate mapped pages belonging to the loop file
+ */
+void loop_flush(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+
+	loop_invalidate_file(lc->filp);
+}
+
+/*--------------------------------------------------------------------
+ * Device-mapper target methods
+ *--------------------------------------------------------------------*/
+/*
+ * Generic loop map function. Re-base I/O to target begin and submit
+ * mapping request to ((struct loop_c *)ti->private)->map_fn.
+ */
+static int loop_map(struct dm_target *ti, struct bio *bio,
+					union map_info *context)
+{
+	struct loop_c *lc = ti->private;
+
+	if (unlikely(bio_barrier(bio)))
+		return -EOPNOTSUPP;
+	/* rebase bio to target begin */
+	bio->bi_sector -= ti->begin;
+	if (lc->map_fn)
+		return lc->map_fn(ti, bio);
+	BUG();
+	return -EIO;
+}
+
+/*
+ * Block status helper.
+ */
+static ssize_t loop_file_status(struct loop_c *lc, char *result, unsigned maxlen)
+{
+	ssize_t sz = 0;
+	struct file_map_c *fc = lc->map_data;
+	int qlen;
+
+	spin_lock_irq(&fc->lock);
+	qlen = bio_list_nr(&fc->work);
+	qlen += bio_list_nr(&fc->in);
+	spin_unlock_irq(&fc->lock);
+	DMEMIT("%d", qlen);
+	return sz;
+}
+
+/*
+ * File status helper.
+ */
+static ssize_t loop_block_status(struct loop_c *lc, char *result, unsigned maxlen)
+{
+	ssize_t sz = 0;
+	struct block_map_c *bc = lc->map_data;
+	int mru;
+	spin_lock_irq(&bc->mru_lock);
+	mru = bc->mru - bc->map;
+	spin_unlock_irq(&bc->mru_lock);
+	DMEMIT("%d %d", bc->nr_extents, mru);
+	return sz;
+}
+
+/*
+ * This needs some thought on handling unlinked backing files. some parts of
+ * the kernel return a cached name (now invalid), while others return a dcache
+ * "/path/to/foo (deleted)" name (never was/is valid). Which is better is
+ * debatable.
+ *
+ * On the one hand, using a cached name gives table output which is directly
+ * usable assuming the user re-creates the unlinked image file, on the other
+ * it is more consistent with e.g. swap to use the dcache name.
+*/
+static int loop_status(struct dm_target *ti, status_type_t type,
+				char *result, unsigned maxlen)
+{
+	struct loop_c *lc = (struct loop_c *) ti->private;
+	ssize_t sz = 0;
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		if (test_bit(BLOCK_TYPE, &lc->flags))
+			sz += loop_block_status(lc, result, maxlen - sz);
+		else if (test_bit(FILE_TYPE, &lc->flags))
+			sz += loop_file_status(lc, result, maxlen - sz);
+		break;
+
+	case STATUSTYPE_TABLE:
+		DMEMIT("%s %llu", lc->path, lc->offset);
+		break;
+	}
+	return 0;
+}
+
+/*
+ * Destroy a loopback mapping
+ */
+static void loop_dtr(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+
+	
+	if ((dm_table_get_mode(ti->table) & FMODE_WRITE))
+		loop_invalidate_file(lc->filp);
+
+	if (test_bit(BLOCK_TYPE, &lc->flags) && lc->map_data)
+		destroy_block_map((struct block_map_c *)lc->map_data,
+				test_bit(VMALLOC, &lc->flags));
+	if (test_bit(FILE_TYPE, &lc->flags) && lc->map_data)
+		destroy_file_map(lc);
+
+	loop_put_file(lc->filp);
+	DMINFO("released file %s", lc->path);
+
+		
+	kfree(lc);
+}
+
+/*
+ * Construct a loopback mapping: <path> <offset>
+ */
+static int loop_ctr(struct dm_target *ti, unsigned argc, char **argv)
+{
+	struct loop_c *lc = NULL;
+	int r = -EINVAL;
+
+	if (argc != 2) {
+		ti->error = "invalid argument count";
+		goto out;
+	}
+
+	r = -ENOMEM;
+	lc = kzalloc(sizeof(*lc), GFP_KERNEL);
+	if (!lc) {
+		ti->error = "cannot allocate loop context";
+		goto out;
+	}
+	lc->path = kstrdup(argv[0], GFP_KERNEL);
+	if (!lc->path) {
+		ti->error = "cannot allocate loop path", -ENOMEM;
+		goto out;
+	}
+
+	ti->private = lc;
+
+	r = -EINVAL;
+	if (sscanf(argv[1], "%lld", &lc->offset) != 1) {
+		ti->error = "invalid file offset";
+		goto out;
+	}
+
+	if (lc->offset) 
+		DMDEBUG("setting file offset to %lld", lc->offset);
+
+	/* defaults */
+	set_bit(BLOCK_TYPE, &lc->flags);
+	/* open & check file and set size parameters */
+	r = loop_get_file(ti);
+	if (r) {
+		/* ti->error has been set by loop_get_file */
+		goto out;
+	}
+
+	if (test_bit(BLOCK_TYPE, &lc->flags))
+		r = setup_block_map(lc, lc->filp->f_mapping->host);
+	if (test_bit(FILE_TYPE, &lc->flags))
+		r = setup_file_map(lc);
+	set_split_io(ti);
+
+	if (r) {
+		ti->error = "could not create extent map";
+		goto out_putf;
+	}
+
+	if (lc->bdev)
+		dm_set_device_limits(ti, lc->bdev);
+
+	DMDEBUG("constructed loop target on %s "
+		"(%lldk, %llu sectors)", lc->path,
+		(lc->size >> 10), lc->mapped_sectors);
+
+	return 0;
+
+out_putf:
+	loop_put_file(lc->filp);
+out:
+	if(lc)
+		kfree(lc);
+	return r;
+}
+
+static struct target_type loop_target = {
+	.name = "loop",
+	.version = {0, 0, 1},
+	.module = THIS_MODULE,
+	.ctr = loop_ctr,
+	.dtr = loop_dtr,
+	.map = loop_map,
+	.presuspend = loop_flush,
+	.flush = loop_flush,
+	.status = loop_status,
+};
+
+/*--------------------------------------------------------------------
+ * Module bits
+ *--------------------------------------------------------------------*/
+int __init dm_loop_init(void)
+{
+	int r;
+
+	r = dm_register_target(&loop_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto out;
+	}
+
+	r = -ENOMEM;
+
+	extent_cache = kmem_cache_create("extent_cache", sizeof(struct extent), 
+					0, SLAB_HWCACHE_ALIGN, NULL, NULL);
+	if (!extent_cache)
+		goto out;
+
+	r = loop_work_init();
+	
+	if (r)
+		goto out;
+
+	DMINFO("registered %s", version);
+	return 0;
+
+out:
+	if (kloopd)
+		loop_work_exit();
+	if (extent_cache)
+		kmem_cache_destroy(extent_cache);
+	return r;
+}
+
+void dm_loop_exit(void)
+{
+	int r;
+
+	loop_work_exit();
+
+	r = dm_unregister_target(&loop_target);
+	kmem_cache_destroy(extent_cache);
+
+	if (r < 0)
+		DMERR("target unregister failed %d", r);
+	else
+		DMINFO("unregistered %s", version);
+}
+
+module_init(dm_loop_init);
+module_exit(dm_loop_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Bryn Reeves <breeves@redhat.com>");
+MODULE_DESCRIPTION("device-mapper loop target");

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2.6.20] updated dm-loop patch
@ 2007-02-06 11:35 Bryn M. Reeves
  2007-02-13 20:20 ` Bryn M. Reeves
  0 siblings, 1 reply; 9+ messages in thread
From: Bryn M. Reeves @ 2007-02-06 11:35 UTC (permalink / raw)
  To: device-mapper development

[-- Attachment #1: Type: text/plain, Size: 1701 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This is a revised version of the device mapper loop target. The patch
applies to 2.6.20, but you'll also need the dm-bio-list-helpers patch
posted earlier.

This release  adds a number of features and removes some of the
limitations of the previous patch:

- - merge Heinz's lookup work & fs I/O support
- - rework allocation for the extent map & extents
- - reorganise context data structures
- - fallback from sparse file detection

* dm-loop can now support files with an arbitrary number of extents
(limited only by available memory) as well as networked file systems and
device-backed files containing holes (sparse files).

* Performance should be much better for large/fragmented backing files.
The old linear code has been replaced with a binary search and we no
longer allocate huge chunks of kernel memory for the extent table.

* Table format is unchanged: <loop path> <offset>, for e.g:

	0 2048 loop /data/img0 0

If you are using a recent version of dmsetup you can symlink it to
either 'dmlosetup' or 'losetup' and use it in much the same way as the
regular losetup.

This version has been tested on ext2/3, NFS and SAMBA. Earlier versions
of the block mapping code have also been tested with XFS, JFS & reiserfs
- - there shouldn't be any problems here, but please report any unexpected
behavior.

Please give the new patch a try and post any problem reports / feedback.

Thanks!

Bryn.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFFyGgP6YSQoMYUY94RAjuYAJ9ieZecyKs6HnhBtvnMfQqaAoXsJACfX+cW
z839Ouwlw4XLHm8SQiugAJs=
=u3wY
-----END PGP SIGNATURE-----

[-- Attachment #2: dm-loop.patch --]
[-- Type: text/x-patch, Size: 24546 bytes --]

This implements a loopback target for device mapper, allowing a regular
file to be treated as a block device.

Signed-off-by: Bryn Reeves <breeves@redhat.com>

===================================================================
diff --git a/drivers/md/dm-loop.c b/drivers/md/dm-loop.c
new file mode 100644
index 0000000..e684402
--- /dev/null
+++ b/drivers/md/dm-loop.c
@@ -0,0 +1,1018 @@
+/*
+ * Copyright (C) 2006 Red Hat, Inc. All rights reserved.
+ *
+ * This file is part of device-mapper.
+ *
+ * drivers/md/dm-loop.c
+ *
+ * Extent mapping implementation heavily influenced by mm/swapfile.c
+ * Bryn Reeves <breeves@redhat.com>
+ *
+ * File mapping and block lookup algorithms support by
+ * Heinz Mauelshagen <hjm@redhat.com>.
+ * 
+ * This file is released under the GPL.
+ *
+ */
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/mm.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/pagemap.h>
+#include <linux/vmalloc.h>
+#include <linux/syscalls.h>
+#include <linux/workqueue.h>
+#include <linux/file.h>
+#include <linux/bio.h>
+
+#include "dm.h"
+#include "dm-bio-list.h"
+#include "dm-bio-record.h"
+
+static const char *version = "v0.412";
+#define DAEMON "kloopd"
+
+#define DM_MSG_PREFIX "loop"
+
+enum flags { BLOCK_TYPE, FILE_TYPE, VMALLOC };
+
+/*--------------------------------------------------------------------
+ * Loop context
+ *--------------------------------------------------------------------*/
+
+struct loop_c {
+	unsigned long flags;
+
+	/* information describing the backing store */
+	struct file *filp;		/* loop file handle */
+	char *path;			/* path argument */
+	loff_t offset;			/* offset argument */
+	struct block_device *bdev;	/* block device */
+	unsigned blkbits;		/* file system block size shift bits */
+
+	loff_t size;			/* size of entire file in bytes */
+	loff_t blocks;			/* blocks allocated to loop file */
+	sector_t mapped_sectors;	/* size of mapped area in sectors*/
+
+	/* mapping */
+	int (*map_fn)(struct dm_target*, struct bio*);
+	/* mapping function private data */
+	void *map_data;
+};
+
+/*
+ * block map extents
+ */
+struct extent {
+	sector_t start;
+	sector_t to;
+	sector_t len;
+};
+
+struct extent_list {
+	struct extent * extent;
+	struct list_head list;
+};
+
+struct kmem_cache *extent_cache;
+
+/* 
+ * block map private context
+ */
+struct block_map_c {
+	int nr_extents;			/* number of extents in map */
+	struct extent **map;		/* linear map of extent pointers */
+	struct extent **mru;		/* pointer to meu entry */
+	spinlock_t mru_lock;		/* protects mru */
+};
+
+/* 
+ * file map private context
+ */
+struct file_map_c {
+	spinlock_t lock;		/* protects in */
+	struct bio_list in;		/* new bios for processing */
+	struct bio_list work;		/* bios queued for processing */
+	struct workqueue_struct *wq;	/* workqueue MOVEME */
+	struct work_struct ws;		/* loop work */
+	struct loop_c *loop;		/* for filp & offset.. ych y fi. */
+};
+
+/*--------------------------------------------------------------------
+ * Generic helper routines						    
+ *--------------------------------------------------------------------*/
+
+static inline sector_t blk2sec(struct loop_c *lc, blkcnt_t block)
+{
+	return block << (lc->blkbits - SECTOR_SHIFT);
+}
+
+static inline blkcnt_t sec2blk(struct loop_c *lc, sector_t sector)
+{
+	return sector >> (lc->blkbits - SECTOR_SHIFT);
+}
+
+/*--------------------------------------------------------------------
+ * File I/O helper routines						   
+ *--------------------------------------------------------------------*/
+
+/*
+ * transfer data to/from file. 
+ */
+static int fs_io(int rw, struct file *filp, loff_t *pos,
+		     struct bio_vec *bv)
+{
+	ssize_t r;
+	void *ptr = kmap(bv->bv_page) + bv->bv_offset;
+	mm_segment_t old_fs = get_fs();
+
+	set_fs(get_ds());
+	r = (rw == READ) ? filp->f_op->read(filp, ptr, bv->bv_len, pos) :
+			   filp->f_op->write(filp, ptr, bv->bv_len, pos);
+	set_fs(old_fs);
+	kunmap(bv->bv_page);
+	return r == bv->bv_len ? 0 : -EIO;
+}
+
+/*
+ * Handle IO for one bio
+ */
+static void do_one_bio(struct file_map_c *fc, struct bio *bio)
+{
+	int r = 0, rw = bio_data_dir(bio);
+	loff_t start = to_bytes(bio->bi_sector) + fc->loop->offset,
+		pos = start;
+	struct bio_vec *bv, *bv_end = bio->bi_io_vec + bio->bi_vcnt;
+
+	for (bv = bio->bi_io_vec; bv < bv_end; bv++) {
+		r = fs_io(rw, fc->loop->filp, &pos, bv);
+		if (r) {
+			DMWARN("%s error %d", rw ? "write":"read" , r);
+			break;
+		}
+	}
+
+	bio_endio(bio, pos - start, r);
+}
+
+/*
+ * Worker thread for a 'file' type loop device
+ */
+static void do_loop_work(struct work_struct *ws)
+{
+	struct file_map_c *fc = container_of(ws, struct file_map_c, ws);
+	struct bio *bio;
+
+	/* quickly grab all new ios queued and add them to the work list */
+	spin_lock_irq(&fc->lock);
+	bio_list_merge_init(&fc->work, &fc->in);
+	spin_unlock_irq(&fc->lock);
+
+	/* work the list and do file IO on all bios */
+	while ((bio = bio_list_pop(&fc->work)))
+		do_one_bio(fc, bio);
+}
+
+/*
+ * Create work queue and initialize work
+ */
+static int loop_work_init(struct loop_c *lc)
+{
+	struct file_map_c *fc = lc->map_data;
+
+	fc->wq = create_singlethread_workqueue(DAEMON);
+	if (!fc->wq)
+		return -ENOMEM;
+
+	INIT_WORK(&fc->ws, do_loop_work);
+	return 0;
+}
+
+/*
+ * Destroy work queue
+ */
+static void loop_work_exit(struct file_map_c *fc)
+{
+	if (fc->wq)
+		destroy_workqueue(fc->wq);
+}
+
+/*
+ * FILE_TYPE map_fn. Mapping just queues ios to the file map
+ * context and lets the daemon deal with them.
+ */
+static int loop_file_map(struct dm_target *ti, struct bio *bio)
+{
+	int wake;
+	struct loop_c *lc = ti->private;
+	struct file_map_c *fc = lc->map_data;
+
+	spin_lock_irq(&fc->lock);
+	wake = bio_list_empty(&fc->in);
+	bio_list_add(&fc->in, bio);
+	spin_unlock_irq(&fc->lock);
+
+	/*
+	 * only call queue_work() if necessary to avoid
+	 * superfluous preempt_{disable/enable}() overhead.
+	 */
+	if (wake)
+		queue_work(fc->wq, &fc->ws);
+
+	/* handling bio -> will submit later */
+	return 0;
+}
+
+static void destroy_file_map(struct loop_c *lc)
+{
+	loop_work_exit(lc->map_data);
+	kfree(lc->map_data);
+}
+
+/* 
+ * Set up a file map context and workqueue
+ */
+static int setup_file_map(struct loop_c *lc)
+{
+	struct file_map_c *fc = kzalloc(sizeof(*fc), GFP_KERNEL);
+	if (!fc)
+		return -ENOMEM;
+	lc->map_data = fc;
+	spin_lock_init(&fc->lock);
+	bio_list_init(&fc->in);
+	bio_list_init(&fc->work);
+	fc->loop = lc;
+
+	lc->map_fn = loop_file_map;
+	return loop_work_init(lc);
+}
+
+/*--------------------------------------------------------------------
+ * Block I/O helper routines
+ *--------------------------------------------------------------------*/
+
+static int contains_sector(struct extent *e, sector_t s)
+{
+	if (likely(e))
+		return s < (e->start + (e->len)) &&
+			s >= e->start;
+	BUG();
+	return 0;
+}
+
+/*
+ * Return an extent range (i.e. beginning+ending physical block numbers). 
+ */
+static int extent_range(struct loop_c * lc, struct inode * inode, blkcnt_t logical_blk,
+			blkcnt_t last_blk, blkcnt_t *begin_blk, blkcnt_t *end_blk)
+{
+	sector_t dist = 0, phys_blk, probe_blk = logical_blk;
+
+	/* Find beginning physical block of extent starting at logical_blk. */
+	*begin_blk = phys_blk = bmap(inode, probe_blk);
+	if (!phys_blk)
+		return -ENXIO;
+
+	for (; phys_blk == *begin_blk + dist; dist++) {
+		*end_blk = phys_blk;
+		if (++probe_blk > last_blk)
+			break;
+
+		phys_blk = bmap(inode, probe_blk);
+		if (unlikely(!phys_blk))
+			return -ENXIO;
+	}
+
+	return 0;
+
+}
+
+/*
+ * Walk over a linked list of extent_list structures, freeing them as
+ * we go. Does not free el->extent.
+ */
+static void destroy_extent_list(struct list_head *head)
+{
+	struct list_head *curr, *n;
+
+	if (list_empty(head))
+		return;
+
+	list_for_each_safe(curr, n, head) {
+		struct extent_list *el;
+		el = list_entry(curr, struct extent_list, list);
+		list_del(curr);
+		kfree(el);
+	}
+}
+
+/*
+ * Add a new extent to the tail of the list at *head with 
+ * start/to/len parameters. Allocates from the extent cache.
+ */
+static int list_add_extent(struct list_head *head, 
+		sector_t start, sector_t to, sector_t len)
+{
+	struct extent *extent;
+	struct extent_list *list;
+	
+	if (!(extent = kmem_cache_alloc(extent_cache, GFP_KERNEL)))
+		goto out;
+	
+	if (!(list = kmalloc(sizeof(*list), GFP_KERNEL)))
+		goto out;
+
+	extent->start = start;
+	extent->to = to;
+	extent->len = len;
+
+	list->extent = extent;
+
+	list_add_tail(&list->list, head);
+
+	return 0;
+out:
+	if (extent)
+		kmem_cache_free(extent_cache, extent);
+	return -ENOMEM;
+}
+
+/* 
+ * Create a sequential list of extents from an inode and return 
+ * it in *head. On success the number of extents found is returned,
+ * or -ERRNO on error 
+ */
+static int loop_extents(struct loop_c *lc, struct inode *inode, 
+			struct list_head *head)
+{
+	sector_t start = 0;
+	int r, nr_extents = 0;
+	blkcnt_t nr_blks = 0, begin_blk = 0, end_blk = 0;
+	blkcnt_t last_blk = sec2blk(lc, lc->mapped_sectors) - 1;
+	blkcnt_t logical_blk = sec2blk(lc, lc->offset);
+
+	while (logical_blk <= last_blk) {
+		r = extent_range(lc, inode, logical_blk, last_blk,
+				&begin_blk, &end_blk);
+		if (unlikely(r)) {
+			DMERR("%s has a hole; sparse file detected - "
+				"switching to filesystem I/O", lc->path);
+			clear_bit(BLOCK_TYPE, &lc->flags);
+			set_bit(FILE_TYPE, &lc->flags);
+			return r;
+		}
+
+		nr_blks = 1 + end_blk - begin_blk;
+
+		if (likely(nr_blks)) {
+			r = list_add_extent(head, start,
+				blk2sec(lc, begin_blk), 
+				blk2sec(lc, nr_blks));
+
+			if (unlikely(r))
+				return r;
+
+			nr_extents++;
+			start += blk2sec(lc, nr_blks);
+			begin_blk += nr_blks;
+			logical_blk += nr_blks;
+		}
+	}
+
+	return nr_extents;
+}
+
+/*
+ * Walk over the extents in a block_map_c, returning them to the cache and
+ * freeing bc via kfree or vfree as appropriate.
+ */
+static void destroy_block_map(struct block_map_c *bc, int v)
+{
+	int i;
+
+	if (!bc)
+		return;
+
+	for (i = 0; i < bc->nr_extents ; i++) {
+		kmem_cache_free(extent_cache, bc->map[i]);
+	}
+	DMDEBUG("%cfreeing block map of %d entries", (v)?'v':'k', i);
+	if (v)
+		vfree(bc->map);
+	else
+		kfree(bc->map);
+	kfree(bc);
+}
+
+/*
+ * Find an extent in *bc using binary search. Returns a pointer into the
+ * map of linear extent pointers. Calculate index as (extent - bc->map).
+ */
+static struct extent **extent_binary_lookup(struct block_map_c *bc,
+					   struct extent **extent_mru,
+					   sector_t sector)
+{
+	unsigned nr_extents = bc->nr_extents;
+	unsigned delta, dist, prev_dist = 0;
+	struct extent **eptr;
+
+	/* Optimize lookup range based on MRU extent. */
+	dist = extent_mru - bc->map;
+	if ((*extent_mru)->start < sector) {
+		delta = (nr_extents - dist) / 2;
+		dist += delta;
+	} else
+		delta = dist = dist / 2;
+
+	eptr = bc->map + dist;
+	while(*eptr && !contains_sector(*eptr, sector)) {
+		if (sector >= (*eptr)->start + (*eptr)->len) {
+			prev_dist = dist;
+			if (delta > 1)
+				delta /= 2;
+
+			dist += delta;
+		} else {
+			delta = (dist - prev_dist) / 2;
+			if (!delta)
+				delta = 1;
+
+			dist -= delta;
+		}
+		eptr = bc->map + dist;
+	}
+	return eptr;
+}
+
+/*
+ * Lookup an extent for a sector using the mru cache and binary search.
+ */
+static struct extent *extent_lookup(struct block_map_c *bc, sector_t sector)
+{
+	struct extent **eptr;
+
+	spin_lock_irq(&bc->mru_lock);
+	eptr = bc->mru;
+	spin_unlock_irq(&bc->mru_lock);
+
+	if (contains_sector(*eptr, sector))
+		return *eptr;
+
+	eptr = extent_binary_lookup(bc, eptr, sector);
+	if (!eptr)
+		return NULL;
+
+	spin_lock_irq(&bc->mru_lock);
+	bc->mru = eptr;
+	spin_unlock_irq(&bc->mru_lock);
+	return *eptr;
+}
+
+/*
+ * BLOCK_TYPE map_fn. Looks up the sector in the extent map and 
+ * rewrites the bio device and bi_sector fields.
+ */
+static int loop_block_map(struct dm_target *ti, struct bio *bio)
+{
+	struct loop_c *lc = ti->private;
+	struct extent *extent = extent_lookup(lc->map_data, bio->bi_sector);
+
+	if (likely(extent)) {
+		bio->bi_bdev = lc->bdev;
+		bio->bi_sector = extent->to +
+				 (bio->bi_sector - extent->start);
+		return 1;       /* Done with bio -> submit */
+	}
+
+	DMERR("no matching extent in map for sector %llu!",
+	      (unsigned long long) bio->bi_sector + ti->begin);
+	//BUG();
+	return -EIO;
+
+}
+
+/*
+ * Turn an extent_list into a linear pointer map of nr_extents + 1 entries
+ * and set the final entry to NULL.
+ */
+static struct extent **build_extent_map(struct list_head *head, 
+				int nr_extents, unsigned long *flags)
+{
+	unsigned map_size, cache_size;
+	struct extent **map, **curr;
+	struct list_head *pos;
+
+	map_size = 1 +(sizeof(**map) * nr_extents);
+	cache_size = kmem_cache_size(extent_cache) * nr_extents;
+
+	/* FIXME: arbitrary limit (arch sensitive?)*/
+	if (map_size > (4 * PAGE_SIZE)) {
+		set_bit(VMALLOC, flags);
+		DMDEBUG("using vmalloc for extent map");
+		map = vmalloc(map_size);
+	} else
+		map = kmalloc(map_size, GFP_KERNEL);
+	if (!map)
+		return ERR_PTR(-ENOMEM);
+
+	curr = map;
+
+	DMDEBUG("allocated linear extent map of %u %s for %d extents (%u %s)", 
+		(map_size < 8192 ) ? map_size : map_size >> 10,	
+		(map_size < 8192 ) ? "bytes" : "kilobytes", nr_extents,
+		(cache_size < 8192) ? cache_size : cache_size >> 10,
+		(cache_size < 8192) ? "bytes" : "kilobytes");
+
+	list_for_each(pos, head) {
+		struct extent_list *el;
+		el = list_entry(pos, struct extent_list, list);
+		*(curr++) = el->extent;
+	}
+	*curr = NULL;
+	return map;
+}
+
+/* 
+ * Set up a block map context and extent map 
+ */
+static int setup_block_map(struct loop_c *lc, struct inode *inode)
+{
+	int r, nr_extents;
+	struct block_map_c *bc;
+	LIST_HEAD(head);
+
+	if (!inode || !inode->i_sb || !inode->i_sb->s_bdev) {
+		return -ENXIO;
+	}
+
+	/* build a linked list of extents in linear order */
+	r = loop_extents(lc, inode, &head);
+
+	if (r<0)
+		goto out;
+
+	nr_extents = r;
+	r = -ENOMEM;
+	
+	if (!(bc = kzalloc(sizeof(*bc), GFP_KERNEL)))
+		goto out;
+
+	/* create a linear map of pointers into the extent cache */
+	bc->map = build_extent_map(&head, nr_extents, &lc->flags);
+
+	if (IS_ERR(bc->map)) {
+		r = PTR_ERR(bc->map);
+		goto out;
+	}
+
+	destroy_extent_list(&head);
+
+	spin_lock_init(&bc->mru_lock);
+	bc->mru = bc->map;
+	bc->nr_extents = nr_extents;
+
+	lc->bdev = inode->i_sb->s_bdev;
+	lc->map_data = bc;
+	lc->map_fn = loop_block_map ;
+	return 0;
+
+out:
+	destroy_extent_list(&head);
+	return r;
+}
+
+/*--------------------------------------------------------------------
+ * Generic helper routines
+ *--------------------------------------------------------------------*/
+
+/*
+ * Invalidate all unlocked loop file pages
+ */
+static int loop_invalidate_file(struct file *filp)
+{
+	return invalidate_inode_pages(filp->f_mapping);
+}
+
+/*
+ * acquire or release a "no-truncate" lock on *filp.
+ * We overload the S_SWAPFILE flag for loop targets because
+ * it provides the same no-truncate semantics we require, and
+ * holding onto i_sem is no longer an option.
+ */
+static void file_truncate_lock(struct file *filp)
+{
+	struct inode *inode = filp->f_mapping->host;
+
+	mutex_lock(&inode->i_mutex);
+	inode->i_flags |= S_SWAPFILE;
+	mutex_unlock(&inode->i_mutex);
+}
+
+static void file_truncate_unlock(struct file *filp)
+{
+	struct inode *inode = filp->f_mapping->host;
+
+	mutex_lock(&inode->i_mutex);
+	inode->i_flags &= ~S_SWAPFILE;
+	mutex_unlock(&inode->i_mutex);
+}
+
+/*
+ * Fill out split_io for taget backing store
+ */
+static void set_split_io(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+	if(test_bit(BLOCK_TYPE,&lc->flags))
+		/* Split I/O at block boundaries */
+		ti->split_io = 1 << (lc->blkbits - SECTOR_SHIFT);
+	else
+		ti->split_io = 64;
+	DMDEBUG("splitting io at %llu sector boundaries", 
+			(unsigned long long) ti->split_io);
+}
+
+/* 
+ * Check that the loop file is regular and available.
+ */
+static int loop_check_file(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+	struct file *filp = lc->filp;
+	struct inode *inode = filp->f_mapping->host;
+
+	if (!inode) {
+		return -ENXIO;
+	}
+
+	if (!S_ISREG(inode->i_mode)) {
+		DMERR("%s is not a regular file", lc->path);
+		return -EINVAL;
+	}
+
+	if (mapping_writably_mapped(filp->f_mapping)) {
+		DMERR("%s is mapped into userspace for writing", lc->path);
+		return -EBUSY;
+	}
+
+	if (mapping_mapped(filp->f_mapping))
+		DMWARN("%s is mapped into userspace", lc->path);
+
+	if (!inode->i_sb || !inode->i_sb->s_bdev) {
+		DMWARN("%s has no blockdevice - switching to filesystem I/O", lc->path);
+		clear_bit(BLOCK_TYPE, &lc->flags);
+		set_bit(FILE_TYPE, &lc->flags);
+	}
+
+	if (IS_SWAPFILE(inode)) {
+		DMERR("%s is already in use", lc->path);
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+/*
+ * Check loop file size and store it in the loop context
+ */
+static int loop_setup_size(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+	struct inode *inode = lc->filp->f_mapping->host;
+	int r = -EINVAL;
+
+	lc->size = i_size_read(inode);
+	lc->blkbits = inode->i_blkbits;
+
+	if (!lc->size){
+		ti->error = "backing file is empty";
+		goto out;
+	}
+
+	if (lc->size < to_bytes(blk2sec(lc,1))) {
+		ti->error = "backing file cannot be less than one block in size";
+		goto out;
+	}
+
+	if (lc->offset & ((1 << lc->blkbits) - 1)) {
+		ti->error = "loop file offset must be a multiple of fs blocksize";
+		goto out;
+	} 
+	if (lc->offset > to_sector(lc->size) - blk2sec(lc, 1)) {
+		ti->error = "loop file offset too large";
+		goto out;
+	}
+
+	lc->mapped_sectors = to_sector(inode->i_size) - lc->offset;
+
+	if (to_bytes(lc->mapped_sectors) < lc->size)
+		DMWARN("not using %llu bytes in incomplete block at EOF",
+		       lc->size - to_bytes(lc->mapped_sectors));
+
+	if (lc->size - lc->offset < to_bytes(ti->len)) {
+		ti->error = "mapped region cannot be smaller than target size";
+		goto out;
+	}
+
+	return 0;
+out:
+	return r;
+}
+
+/*
+ * release a loop file
+ */
+static void loop_put_file(struct file *filp)
+{
+	if (!filp)
+		return;
+
+	file_truncate_unlock(filp);
+	filp_close(filp, NULL);
+}
+
+/*
+ * open loop file and perform type, availability and size checks.
+ */
+static int loop_get_file(struct dm_target *ti)
+{
+	int flags = ((dm_table_get_mode(ti->table) & FMODE_WRITE) ?
+		    O_RDWR : O_RDONLY) | O_LARGEFILE;
+	struct loop_c *lc = ti->private;
+	struct file *filp;
+	int r = 0;
+
+	filp = filp_open(lc->path, flags, 0);
+
+	if (IS_ERR(filp))
+		return PTR_ERR(filp);
+
+	lc->filp = filp;
+
+	r = loop_check_file(ti);
+	if (r)
+		goto out_put;
+
+	r = loop_setup_size(ti);
+	if (r)
+		goto out_put;
+
+	file_truncate_lock(filp);
+	return 0;
+
+out_put:
+	fput(filp);
+	return r;
+	
+}
+
+/*
+ * invalidate mapped pages belonging to the loop file
+ */
+void loop_flush(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+
+	loop_invalidate_file(lc->filp);
+}
+
+/*--------------------------------------------------------------------
+ * Device-mapper target methods
+ *--------------------------------------------------------------------*/
+/*
+ * Generic loop map function. Re-base I/O to target begin and submit
+ * mapping request to ((struct loop_c *)ti->private)->map_fn.
+ */
+static int loop_map(struct dm_target *ti, struct bio *bio,
+					union map_info *context)
+{
+	struct loop_c *lc = ti->private;
+
+	if (unlikely(bio_barrier(bio)))
+		return -EOPNOTSUPP;
+	/* rebase bio to target begin */
+	bio->bi_sector -= ti->begin;
+	if (lc->map_fn)
+		return lc->map_fn(ti, bio);
+	BUG();
+	return -EIO;
+}
+
+/*
+ * Block status helper.
+ */
+static ssize_t loop_file_status(struct loop_c *lc, char *result, unsigned maxlen)
+{
+	ssize_t sz = 0;
+	struct file_map_c *fc = lc->map_data;
+	int qlen;
+
+	spin_lock_irq(&fc->lock);
+	qlen = bio_list_nr(&fc->work);
+	qlen += bio_list_nr(&fc->in);
+	spin_unlock_irq(&fc->lock);
+	DMEMIT("%d", qlen);
+	return sz;
+}
+
+/*
+ * File status helper.
+ */
+static ssize_t loop_block_status(struct loop_c *lc, char *result, unsigned maxlen)
+{
+	ssize_t sz = 0;
+	struct block_map_c *bc = lc->map_data;
+	int mru;
+	spin_lock_irq(&bc->mru_lock);
+	mru = bc->mru - bc->map;
+	spin_unlock_irq(&bc->mru_lock);
+	DMEMIT("%d %d", bc->nr_extents, mru);
+	return sz;
+}
+
+/*
+ * This needs some thought on handling unlinked backing files. some parts of
+ * the kernel return a cached name (now invalid), while others return a dcache
+ * "/path/to/foo (deleted)" name (never was/is valid). Which is better is
+ * debatable.
+ *
+ * On the one hand, using a cached name gives table output which is directly
+ * usable assuming the user re-creates the unlinked image file, on the other
+ * it is more consistent with e.g. swap to use the dcache name.
+*/
+static int loop_status(struct dm_target *ti, status_type_t type,
+				char *result, unsigned maxlen)
+{
+	struct loop_c *lc = (struct loop_c *) ti->private;
+	ssize_t sz = 0;
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		if (test_bit(BLOCK_TYPE, &lc->flags))
+			sz += loop_block_status(lc, result, maxlen - sz);
+		else if (test_bit(FILE_TYPE, &lc->flags))
+			sz += loop_file_status(lc, result, maxlen - sz);
+		break;
+
+	case STATUSTYPE_TABLE:
+		DMEMIT("%s %llu", lc->path, lc->offset);
+		break;
+	}
+	return 0;
+}
+
+/*
+ * Destroy a loopback mapping
+ */
+static void loop_dtr(struct dm_target *ti)
+{
+	struct loop_c *lc = ti->private;
+
+	
+	if ((dm_table_get_mode(ti->table) & FMODE_WRITE))
+		loop_invalidate_file(lc->filp);
+
+	if (test_bit(BLOCK_TYPE, &lc->flags) && lc->map_data)
+		destroy_block_map((struct block_map_c *)lc->map_data,
+				test_bit(VMALLOC, &lc->flags));
+	if (test_bit(FILE_TYPE, &lc->flags) && lc->map_data)
+		destroy_file_map(lc);
+
+	loop_put_file(lc->filp);
+	DMINFO("released file %s", lc->path);
+
+		
+	kfree(lc);
+}
+
+/*
+ * Construct a loopback mapping: <path> <offset>
+ */
+static int loop_ctr(struct dm_target *ti, unsigned argc, char **argv)
+{
+	struct loop_c *lc = NULL;
+	int r = -EINVAL;
+
+	if (argc != 2) {
+		ti->error = "invalid argument count";
+		goto out;
+	}
+
+	r = -ENOMEM;
+	lc = kzalloc(sizeof(*lc), GFP_KERNEL);
+	if (!lc) {
+		ti->error = "cannot allocate loop context";
+		goto out;
+	}
+	lc->path = kstrdup(argv[0], GFP_KERNEL);
+	if (!lc->path) {
+		ti->error = "cannot allocate loop path", -ENOMEM;
+		goto out;
+	}
+
+	ti->private = lc;
+
+	r = -EINVAL;
+	if (sscanf(argv[1], "%lld", &lc->offset) != 1) {
+		ti->error = "invalid file offset";
+		goto out;
+	}
+
+	/* defaults */
+	set_bit(BLOCK_TYPE, &lc->flags);
+	/* open & check file and set size parameters */
+	r = loop_get_file(ti);
+	if (r) {
+		ti->error = "could not open loop backing file";
+		goto out;
+	}
+
+	if (test_bit(BLOCK_TYPE, &lc->flags))
+		r = setup_block_map(lc, lc->filp->f_mapping->host);
+	if (test_bit(FILE_TYPE, &lc->flags))
+		r = setup_file_map(lc);
+	set_split_io(ti);
+
+	if (r) {
+		ti->error = "could not create extent map";
+		goto out_putf;
+	}
+
+	if (lc->bdev)
+		dm_set_device_limits(ti, lc->bdev);
+
+	DMDEBUG("constructed loop target on %s "
+		"(%lldk, %llu sectors)", lc->path,
+		(lc->size >> 10), lc->mapped_sectors);
+
+	return 0;
+
+out_putf:
+	loop_put_file(lc->filp);
+out:
+	if(lc)
+		kfree(lc);
+	return r;
+}
+
+static struct target_type loop_target = {
+	.name = "loop",
+	.version = {0, 0, 1},
+	.module = THIS_MODULE,
+	.ctr = loop_ctr,
+	.dtr = loop_dtr,
+	.map = loop_map,
+	.presuspend = loop_flush,
+	.flush = loop_flush,
+	.status = loop_status,
+};
+
+/*--------------------------------------------------------------------
+ * Module bits
+ *--------------------------------------------------------------------*/
+int __init dm_loop_init(void)
+{
+	int r;
+
+	r = dm_register_target(&loop_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto out;
+	}
+
+	r = -ENOMEM;
+
+	extent_cache = kmem_cache_create("extent_cache", sizeof(struct extent), 
+					0, SLAB_HWCACHE_ALIGN, NULL, NULL);
+	if (!extent_cache)
+		goto out;
+
+	DMINFO("registered %s", version);
+	return 0;
+
+out:
+	if (extent_cache)
+		kmem_cache_destroy(extent_cache);
+	return r;
+}
+
+void dm_loop_exit(void)
+{
+	int r;
+
+	r = dm_unregister_target(&loop_target);
+	kmem_cache_destroy(extent_cache);
+
+	if (r < 0)
+		DMERR("target unregister failed %d", r);
+	else
+		DMINFO("unregistered %s", version);
+}
+
+module_init(dm_loop_init);
+module_exit(dm_loop_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Bryn Reeves <breeves@redhat.com>");
+MODULE_DESCRIPTION("device-mapper loop target");

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-02-21  0:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-12  8:49 [PATCH 2.6.20] updated dm-loop patch roland
2007-02-12 15:24 ` Bryn M. Reeves
  -- strict thread matches above, loose matches on Subject: below --
2007-02-15 22:30 devzero
2007-02-15 11:45 devzero
2007-02-15 12:28 ` Bryn M. Reeves
2007-02-21  0:14   ` roland
2007-02-06 11:35 Bryn M. Reeves
2007-02-13 20:20 ` Bryn M. Reeves
2007-02-15  0:51   ` Bryn M. Reeves

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.