linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* (unknown), 
@ 2017-10-15  3:28 redaccion
  0 siblings, 0 replies; 211+ messages in thread
From: redaccion @ 2017-10-15  3:28 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 367770.zip --]
[-- Type: application/zip, Size: 2805 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2019-01-15  2:55 Jens Axboe
  0 siblings, 0 replies; 211+ messages in thread
From: Jens Axboe @ 2019-01-15  2:55 UTC (permalink / raw)
  To: linux-fsdevel, linux-aio, linux-block, linux-arch; +Cc: hch, jmoyer, avi

Here's v4 of the io_uring interface. No user visible changes this
time, outside of bumping the io_uring_sqe submission entry to a
full 64-bytes. This aligns better with caches, and leaves us some
room to grow for future features. See the v3 posting for full
details on the API:

https://lore.kernel.org/linux-block/20190112213011.1439-1-axboe@kernel.dk/

What I neglected to mention in the v3 posting, is that the fixed
buffer and fixed file interfaces are available through the
io_uring_register() system call. This means they can be registered
(and unregistered) independently of the io_uring context setup.

Patches are against 5.0-rc2 and can also be found in my 'io_uring'
git branch:

git://git.kernel.dk/linux-block io_uring

Changes since v3:

- Clean up fixed buffer index validation
- Add IORING_OP_NOP for ring perf testing
- Drop struct io_kiocb ki_* variable prefix, it clashes with struct
  kiocb for no reason except to cause confusement
- Bump io_uring_sqe to 64 bytes. Cacheline sized and aligned
  (on x86-64), and more future proof
- Use kmalloc_array()
- Make the page mlock rlimit incremental and not for root / CAP_IPC_LOCK
- Ensure io_uring_register() can't race with fops->release()
- Simplify and improve iopoll implementation
- Use FOLL_WRITE instead of open-coding it
- Fix 32-bit vs 64-bit sizing for the io_uring_register() structs
- Added x86 32-bit system calls
- Added 32-bit compat mode
- Rebased on 5.0-rc2


 Documentation/filesystems/vfs.txt      |    3 +
 arch/x86/entry/syscalls/syscall_32.tbl |    3 +
 arch/x86/entry/syscalls/syscall_64.tbl |    3 +
 block/bio.c                            |   59 +-
 fs/Makefile                            |    1 +
 fs/block_dev.c                         |   19 +-
 fs/file.c                              |   15 +-
 fs/file_table.c                        |    9 +-
 fs/gfs2/file.c                         |    2 +
 fs/io_uring.c                          | 2072 ++++++++++++++++++++++++
 fs/iomap.c                             |   48 +-
 fs/xfs/xfs_file.c                      |    1 +
 include/linux/bio.h                    |   14 +
 include/linux/blk_types.h              |    1 +
 include/linux/file.h                   |    2 +
 include/linux/fs.h                     |    6 +-
 include/linux/iomap.h                  |    1 +
 include/linux/sched/user.h             |    2 +-
 include/linux/syscalls.h               |    7 +
 include/uapi/linux/io_uring.h          |  155 ++
 init/Kconfig                           |    9 +
 kernel/sys_ni.c                        |    3 +
 22 files changed, 2395 insertions(+), 40 deletions(-)

-- 
Jens Axboe


--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2018-01-29 17:17 Jones
  0 siblings, 0 replies; 211+ messages in thread
From: Jones @ 2018-01-29 17:17 UTC (permalink / raw)


This is in regards to an inheritance on your surname, reply back using your email address, stating your full name for more details. Reply to email for info. Email me here ( gertvm@dr.com )

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-11-15 14:44 Qing Chang
  0 siblings, 0 replies; 211+ messages in thread
From: Qing Chang @ 2017-11-15 14:44 UTC (permalink / raw)
  To: linux fsdevel

hi Linux



http://bit.ly/2iXiosH




Qing

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-11-06 19:51 Qing Chang
  0 siblings, 0 replies; 211+ messages in thread
From: Qing Chang @ 2017-11-06 19:51 UTC (permalink / raw)
  To: linux fsdevel

Hey Linux


http://bit.ly/2y5JOmP



Qing Chang

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-10-12 14:09 redaccion
  0 siblings, 0 replies; 211+ messages in thread
From: redaccion @ 2017-10-12 14:09 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 4319450234.zip --]
[-- Type: application/zip, Size: 2801 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-10-08 22:32 natasha.glauser
  0 siblings, 0 replies; 211+ messages in thread
From: natasha.glauser @ 2017-10-08 22:32 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 7210184386.zip --]
[-- Type: application/zip, Size: 7244 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-10-08  1:26 redaccion
  0 siblings, 0 replies; 211+ messages in thread
From: redaccion @ 2017-10-08  1:26 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 9108707.zip --]
[-- Type: application/zip, Size: 7119 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-10-04 16:11 1.10.0812112155390.21775
  0 siblings, 0 replies; 211+ messages in thread
From: 1.10.0812112155390.21775 @ 2017-10-04 16:11 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 2649741863647.zip --]
[-- Type: application/zip, Size: 7246 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-09-30 14:07 redaccion
  0 siblings, 0 replies; 211+ messages in thread
From: redaccion @ 2017-09-30 14:07 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 5283737024430.zip --]
[-- Type: application/zip, Size: 7153 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-09-29 15:21 natasha.glauser
  0 siblings, 0 replies; 211+ messages in thread
From: natasha.glauser @ 2017-09-29 15:21 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 033033737342463.zip --]
[-- Type: application/zip, Size: 7234 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-09-28  0:21 natasha.glauser
  0 siblings, 0 replies; 211+ messages in thread
From: natasha.glauser @ 2017-09-28  0:21 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 16782652.doc --]
[-- Type: application/msword, Size: 59577 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-09-13  4:21 natasha.glauser
  0 siblings, 0 replies; 211+ messages in thread
From: natasha.glauser @ 2017-09-13  4:21 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 629854663368780.doc --]
[-- Type: application/msword, Size: 43208 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-09-05 12:51 ifalqi
  0 siblings, 0 replies; 211+ messages in thread
From: ifalqi @ 2017-09-05 12:51 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 9731908610.doc --]
[-- Type: application/msword, Size: 73845 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-09-01 22:55 redaccion
  0 siblings, 0 replies; 211+ messages in thread
From: redaccion @ 2017-09-01 22:55 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 861574953961.doc --]
[-- Type: application/msword, Size: 40147 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-31  9:54 info
  0 siblings, 0 replies; 211+ messages in thread
From: info @ 2017-08-31  9:54 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 3622620.doc --]
[-- Type: application/msword, Size: 41541 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-31  0:58 info
  0 siblings, 0 replies; 211+ messages in thread
From: info @ 2017-08-31  0:58 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 52336.doc --]
[-- Type: application/msword, Size: 30930 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-30  0:38 ifalqi
  0 siblings, 0 replies; 211+ messages in thread
From: ifalqi @ 2017-08-30  0:38 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 650953515.doc --]
[-- Type: application/msword, Size: 30657 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-11 20:11 tammyehood
  0 siblings, 0 replies; 211+ messages in thread
From: tammyehood @ 2017-08-11 20:11 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 77412.zip --]
[-- Type: application/zip, Size: 2798 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-11 15:50 1.10.0812112155390.21775
  0 siblings, 0 replies; 211+ messages in thread
From: 1.10.0812112155390.21775 @ 2017-08-11 15:50 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 278428204.zip --]
[-- Type: application/zip, Size: 2803 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-11  6:14 администратор 
  0 siblings, 0 replies; 211+ messages in thread
From: администратор  @ 2017-08-11  6:14 UTC (permalink / raw)




внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет
отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...9o76ypp2345t..2017
Почты технической поддержки &copy;2017

спасибо
системы администратор 

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-09 19:36 tammyehood
  0 siblings, 0 replies; 211+ messages in thread
From: tammyehood @ 2017-08-09 19:36 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 0058486113211.zip --]
[-- Type: application/zip, Size: 9996 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-09 14:34 shwx002
  0 siblings, 0 replies; 211+ messages in thread
From: shwx002 @ 2017-08-09 14:34 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 46684317829.zip --]
[-- Type: application/zip, Size: 10187 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-09 10:20 системы администратор
  0 siblings, 0 replies; 211+ messages in thread
From: системы администратор @ 2017-08-09 10:20 UTC (permalink / raw)


внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...9o76ypp2345t..2017
Почты технической поддержки ©2017

спасибо
системы администратор

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-09  0:41 natasha.glauser
  0 siblings, 0 replies; 211+ messages in thread
From: natasha.glauser @ 2017-08-09  0:41 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 44541508673885.zip --]
[-- Type: application/zip, Size: 2759 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-07 11:50 1.10.0812112155390.21775
  0 siblings, 0 replies; 211+ messages in thread
From: 1.10.0812112155390.21775 @ 2017-08-07 11:50 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: MESSAGE_10248647599809_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 2767 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-03 19:52 natasha.glauser
  0 siblings, 0 replies; 211+ messages in thread
From: natasha.glauser @ 2017-08-03 19:52 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 188047157183604.zip --]
[-- Type: application/zip, Size: 2952 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-02 12:55 tammyehood
  0 siblings, 0 replies; 211+ messages in thread
From: tammyehood @ 2017-08-02 12:55 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_7005561631_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 2815 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-02  3:45 системы администратор
  0 siblings, 0 replies; 211+ messages in thread
From: системы администратор @ 2017-08-02  3:45 UTC (permalink / raw)


внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...776774990..2017
Почты технической поддержки ©2017

спасибо
системы администратор

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-01 21:19 tammyehood
  0 siblings, 0 replies; 211+ messages in thread
From: tammyehood @ 2017-08-01 21:19 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_062920054084147_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 2835 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-08-01 19:35 anderslindgaard
  0 siblings, 0 replies; 211+ messages in thread
From: anderslindgaard @ 2017-08-01 19:35 UTC (permalink / raw)
  To: linux fsdevel

hiya Linux

http://www.maxtra.cl/index/wp-content/plugins/pixcodes/views/index_old.php?busy=gt2vetuv76w2yz1x

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-07-31 21:27 natasha.glauser
  0 siblings, 0 replies; 211+ messages in thread
From: natasha.glauser @ 2017-07-31 21:27 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_04030628274029_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 2678 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-07-26  2:25 tammyehood
  0 siblings, 0 replies; 211+ messages in thread
From: tammyehood @ 2017-07-26  2:25 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_2677628586_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 5660 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-07-25 14:56 nhossein4212003
  0 siblings, 0 replies; 211+ messages in thread
From: nhossein4212003 @ 2017-07-25 14:56 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_968240354671258_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 5752 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-07-18 11:36 shwx002
  0 siblings, 0 replies; 211+ messages in thread
From: shwx002 @ 2017-07-18 11:36 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: "EMAIL_64633847013_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 3281 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-07-10  3:45 системы администратор
  0 siblings, 0 replies; 211+ messages in thread
From: системы администратор @ 2017-07-10  3:45 UTC (permalink / raw)


внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...9o76ypp2345t..2017
Почты технической поддержки ©2017

спасибо
системы администратор

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-07-09 23:19 Corporate Lenders
  0 siblings, 0 replies; 211+ messages in thread
From: Corporate Lenders @ 2017-07-09 23:19 UTC (permalink / raw)


Schönen Tag,

Ich bin Thomas Walter, der Finanzagent dieser Firma, bekannt als Corporate Lenders. Wir leihen Geld für Einzelpersonen und Unternehmen, die finanzielle Hilfe benötigen. Hast du einen schlechten Kredit oder du brauchst Geld, um deine Rechnungen zu bezahlen? Wir verwenden dieses Medium, um Ihnen mitzuteilen, dass wir Ihnen bei jeder Form von Darlehen helfen können, wie Sie Refinanzierung, Schuldenkonsolidierung Darlehen, persönliche Darlehen, internationale Darlehen und Business-Darlehen. Wir freuen uns, Ihnen ein Darlehen so niedrig wie der Zinssatz von 3% anzubieten.

Unsere Mission ist es, unseren Kunden einen Service zu bieten, der schnell, freundlich und stressfrei ist. Normalerweise, wenn wir alle Ihre Informationen haben, dauert es nur eine Stunde, um die Genehmigung zu finanzieren.

Wenn Sie interessiert sind, füllen Sie bitte das Darlehensantragsformular aus.

Vollständiger Name:
Geschlecht:
Benötigte Menge:
Dauer:
Tel:
Sprich Englisch?

Wir warten auf Ihre Antwort.

Sie erreichen uns per E-Mail: info@corporatelendersonline.com
Mit freundlichen Grüßen,
Thomas Walter

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-07-05  7:00 benjamin
  0 siblings, 0 replies; 211+ messages in thread
From: benjamin @ 2017-07-05  7:00 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EBAY_36890034909_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 2342 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-07-03 14:13 tammyehood
  0 siblings, 0 replies; 211+ messages in thread
From: tammyehood @ 2017-07-03 14:13 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_531101184_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 3146 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-07-01 21:28 redaccion
  0 siblings, 0 replies; 211+ messages in thread
From: redaccion @ 2017-07-01 21:28 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_90244_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 3187 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-30  2:53 1.10.0812112155390.21775
  0 siblings, 0 replies; 211+ messages in thread
From: 1.10.0812112155390.21775 @ 2017-06-30  2:53 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 066785575956.zip --]
[-- Type: application/zip, Size: 3368 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-28  3:56 системы администратор
  0 siblings, 0 replies; 211+ messages in thread
From: системы администратор @ 2017-06-28  3:56 UTC (permalink / raw)


внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...776774990..2017
Почты технической поддержки ©2017

спасибо
системы администратор

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-27 11:59 natasha.glauser
  0 siblings, 0 replies; 211+ messages in thread
From: natasha.glauser @ 2017-06-27 11:59 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_468535330447271_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 3452 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-26 22:58 Anders Lind
  0 siblings, 0 replies; 211+ messages in thread
From: Anders Lind @ 2017-06-26 22:58 UTC (permalink / raw)
  To: linux fsdevel

Good morning Linux



http://www.me-lawoffice.com/cat_add.php?son=rmtgusk26880ceteu




Anders

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-24 15:41 benjamin
  0 siblings, 0 replies; 211+ messages in thread
From: benjamin @ 2017-06-24 15:41 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_09074482_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 7653 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-24 12:38 redaccion
  0 siblings, 0 replies; 211+ messages in thread
From: redaccion @ 2017-06-24 12:38 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_896297041142370_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 3459 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-24 11:55 natasha.glauser
  0 siblings, 0 replies; 211+ messages in thread
From: natasha.glauser @ 2017-06-24 11:55 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_32495_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 7861 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-20 22:49 redaccion
  0 siblings, 0 replies; 211+ messages in thread
From: redaccion @ 2017-06-20 22:49 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 3712936079.zip --]
[-- Type: application/zip, Size: 3453 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-15 13:50 pohut00
  0 siblings, 0 replies; 211+ messages in thread
From: pohut00 @ 2017-06-15 13:50 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 241385535172685.zip --]
[-- Type: application/zip, Size: 5327 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-12 19:12 nhossein4212003
  0 siblings, 0 replies; 211+ messages in thread
From: nhossein4212003 @ 2017-06-12 19:12 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 39025725073.zip --]
[-- Type: application/zip, Size: 3493 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-11 18:16 tammyehood
  0 siblings, 0 replies; 211+ messages in thread
From: tammyehood @ 2017-06-11 18:16 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 46536729943268.zip --]
[-- Type: application/zip, Size: 3145 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-11  4:42 1.10.0812112155390.21775
  0 siblings, 0 replies; 211+ messages in thread
From: 1.10.0812112155390.21775 @ 2017-06-11  4:42 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 99741.zip --]
[-- Type: application/zip, Size: 3194 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-11  3:28 redaccion
  0 siblings, 0 replies; 211+ messages in thread
From: redaccion @ 2017-06-11  3:28 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 4748827.zip --]
[-- Type: application/zip, Size: 3180 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-08 17:26 natasha.glauser
  0 siblings, 0 replies; 211+ messages in thread
From: natasha.glauser @ 2017-06-08 17:26 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 9078859657.zip --]
[-- Type: application/zip, Size: 3191 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-07 22:30 tammyehood
  0 siblings, 0 replies; 211+ messages in thread
From: tammyehood @ 2017-06-07 22:30 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 2944259303743.zip --]
[-- Type: application/zip, Size: 3184 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-07 14:00 1.10.0812112155390.21775
  0 siblings, 0 replies; 211+ messages in thread
From: 1.10.0812112155390.21775 @ 2017-06-07 14:00 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 470011811002.zip --]
[-- Type: application/zip, Size: 3194 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-07 11:43 nhossein4212003
  0 siblings, 0 replies; 211+ messages in thread
From: nhossein4212003 @ 2017-06-07 11:43 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 701241854906746.zip --]
[-- Type: application/zip, Size: 3106 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-06-06  7:19 From Lori J. Robinson
  0 siblings, 0 replies; 211+ messages in thread
From: From Lori J. Robinson @ 2017-06-06  7:19 UTC (permalink / raw)


Hello,

I am General Lori J. Robinson, I am presently in Afghanistan serving
the UN/NATO military assignment here,i have an important discussion
with you  kindly respond to me through my private  box
lori_robinson.usa@hotmail.com  so that we can know ourselves better. I
hope to read from you if your are also interested. Thanks and hoping
to hear from you soonest.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-05-24 16:26 natasha.glauser
  0 siblings, 0 replies; 211+ messages in thread
From: natasha.glauser @ 2017-05-24 16:26 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 2605145.zip --]
[-- Type: application/zip, Size: 3160 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-05-23 16:29 benjamin
  0 siblings, 0 replies; 211+ messages in thread
From: benjamin @ 2017-05-23 16:29 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 73458503834697.zip --]
[-- Type: application/zip, Size: 3207 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-05-21  8:55 benjamin
  0 siblings, 0 replies; 211+ messages in thread
From: benjamin @ 2017-05-21  8:55 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 087833.zip --]
[-- Type: application/zip, Size: 2855 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-05-20 11:03 pohut00
  0 siblings, 0 replies; 211+ messages in thread
From: pohut00 @ 2017-05-20 11:03 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 82304.zip --]
[-- Type: application/zip, Size: 2820 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-05-17  7:10 1.10.0812112155390.21775
  0 siblings, 0 replies; 211+ messages in thread
From: 1.10.0812112155390.21775 @ 2017-05-17  7:10 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 056305.zip --]
[-- Type: application/zip, Size: 2929 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-04-28  8:36 администратор
  0 siblings, 0 replies; 211+ messages in thread
From: администратор @ 2017-04-28  8:36 UTC (permalink / raw)


внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...635829wjxnxl....74990.RU.2017
Почты технической поддержки ©2017

спасибо
системы администратор

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-04-21 17:40 Mr.Jerry Smith
  0 siblings, 0 replies; 211+ messages in thread
From: Mr.Jerry Smith @ 2017-04-21 17:40 UTC (permalink / raw)




We Give Out Loans At 3% Interest Rate And We Offer Loans From $5,000 To $50,000,000.00, Are You Looking To Buy A House Car Or Company Or Start Up A Truck Company or Buy A Truck Or Personal Loans Or Business Loan, Email Us At jerryfunds11@inbox.lv  With Amount Needed And Phone Number.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-04-16 22:46 tammyehood
  0 siblings, 0 replies; 211+ messages in thread
From: tammyehood @ 2017-04-16 22:46 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: BILL-827602-linux-fsdevel.zip --]
[-- Type: application/zip, Size: 2036 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-04-16  6:21 shwx002
  0 siblings, 0 replies; 211+ messages in thread
From: shwx002 @ 2017-04-16  6:21 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_18980_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 3895 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-04-09 14:27 weingart
  0 siblings, 0 replies; 211+ messages in thread
From: weingart @ 2017-04-09 14:27 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 84450679_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 3630 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-04-06 13:49 benjamin
  0 siblings, 0 replies; 211+ messages in thread
From: benjamin @ 2017-04-06 13:49 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: 33943822134670.zip --]
[-- Type: application/zip, Size: 25 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-03-20  0:26 Qing Chang
  0 siblings, 0 replies; 211+ messages in thread
From: Qing Chang @ 2017-03-20  0:26 UTC (permalink / raw)
  To: linux fsdevel

hello Linux


http://www.skywalkers.gr/mmenuns4.php?largest=25gthun1ksmp5v5





Qing Chang

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-03-14 23:24 nhossein4212003
  0 siblings, 0 replies; 211+ messages in thread
From: nhossein4212003 @ 2017-03-14 23:24 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_84791057025623_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 4431 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-01-23 14:54 nhossein4212003
  0 siblings, 0 replies; 211+ messages in thread
From: nhossein4212003 @ 2017-01-23 14:54 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_17028_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 60028 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-01-03  6:57 системы администратор
  0 siblings, 0 replies; 211+ messages in thread
From: системы администратор @ 2017-01-03  6:57 UTC (permalink / raw)




внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...776774990..2017
Почты технической поддержки ©2017

спасибо
системы администратор

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-01-03  6:48 системы администратор
  0 siblings, 0 replies; 211+ messages in thread
From: системы администратор @ 2017-01-03  6:48 UTC (permalink / raw)




внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...776774990..2017
Почты технической поддержки ©2017

спасибо
системы администратор

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2017-01-03  6:48 системы администратор
  0 siblings, 0 replies; 211+ messages in thread
From: системы администратор @ 2017-01-03  6:48 UTC (permalink / raw)




внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...776774990..2017
Почты технической поддержки ©2017

спасибо
системы администратор

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2016-12-16 10:46 системы администратор
  0 siblings, 0 replies; 211+ messages in thread
From: системы администратор @ 2016-12-16 10:46 UTC (permalink / raw)


внимания;

аши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...776774990..2016
Почты технической поддержки ©2016

спасибо
системы администратор

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2016-12-14  3:54 Mr Friedrich Mayrhofer
  0 siblings, 0 replies; 211+ messages in thread
From: Mr Friedrich Mayrhofer @ 2016-12-14  3:54 UTC (permalink / raw)



Good Day,

This is the second time i am sending you this mail.

I, Friedrich Mayrhofer Donate $ 1,000,000.00 to You, Email Me
personally for more details.

Regards.
Friedrich Mayrhofer







^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2016-10-31 12:51 Debra_Farmer/SSB/HIDOE
  0 siblings, 0 replies; 211+ messages in thread
From: Debra_Farmer/SSB/HIDOE @ 2016-10-31 12:51 UTC (permalink / raw)



I am Mrs. Gu Kailai and i intend to make a DONATION. Contact my personal E-mail Via: mrsgukailai@post.cz for more details:

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2016-10-22 14:52 ifalqi
  0 siblings, 0 replies; 211+ messages in thread
From: ifalqi @ 2016-10-22 14:52 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: EMAIL_42787235909309_linux-fsdevel.zip --]
[-- Type: application/zip, Size: 6022 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2015-10-26 10:18 Michael Wilke
  0 siblings, 0 replies; 211+ messages in thread
From: Michael Wilke @ 2015-10-26 10:18 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 23 bytes --]

unsubscribe linux-cifs

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2015-10-25 14:15 Paul, Baloyi
  0 siblings, 0 replies; 211+ messages in thread
From: Paul, Baloyi @ 2015-10-25 14:15 UTC (permalink / raw)
  To: Recipients

 Good Day,
In searching for my late client who died few years ago a long side his family in boat accident, I came across your last name which matches with my client last name. I hereby seek your consent to present you as beneficiary to his funds $27.5m since you have the same last name, There are claim file in Documents to enable you and claim the funds legitimately, if you interested, Please Contact me on paulcambobaloyi@gmail.com with your names, age, Phone and Nationality for more Details on how to claim the funds.
Best Regards,
Mr.Paul Baloyi

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
       [not found]                                                                                                                           ` <1625739759.3169361.1445752746358.JavaMail.yahoo@mail.yahoo.com>
@ 2015-10-25  6:01                                                                                                                             ` From Mrs Rosemary
  0 siblings, 0 replies; 211+ messages in thread
From: From Mrs Rosemary @ 2015-10-25  6:01 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 2070 bytes --]









































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































--





































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































My Dearest one,
How are you doing today? please kindly get back to me 

[-- Attachment #2: FROM MRS ROSEMARY ZANDILE553.pdf --]
[-- Type: application/pdf, Size: 7176 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
       [not found]                                                                                                                           ` <810791924.1429588.1445514273742.JavaMail.yahoo@mail.yahoo.com>
@ 2015-10-22 11:45                                                                                                                             ` From Mrs Rosemary
  0 siblings, 0 replies; 211+ messages in thread
From: From Mrs Rosemary @ 2015-10-22 11:45 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1972 bytes --]














































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































--









































































































































































--
















-----



























































































































































































































































































































































































































































































































































































































My Dearest one,

How are you doing today? please kindly get back to me 

[-- Attachment #2: FROM MRS ROSEMARY ZANDILE553.pdf --]
[-- Type: application/pdf, Size: 7176 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2015-09-23 17:11 jerryfunds24
  0 siblings, 0 replies; 211+ messages in thread
From: jerryfunds24 @ 2015-09-23 17:11 UTC (permalink / raw)
  To: Recipients

We Give Out Loans For 3% Interest Rate And We Offer Loans From $5,000 To $50,000,000.00, Are You Looking To Buy A House Car Or Company Or Start Up A Truck Company or Buy A Truck Or Personal Loans, Email Us At jerrysmith@inbox.lv  With Amount Needed And Phone Number.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2015-09-07 19:41 Mary Williams
  0 siblings, 0 replies; 211+ messages in thread
From: Mary Williams @ 2015-09-07 19:41 UTC (permalink / raw)


ARE YOU IN NEED OF LOAN @ 3% INTEREST RATE FOR BUSINESS AND  PRIVATE PURPOSES?
IF YES: FILL AND RETURN
Name: =======
Amount needed: ===
Duration: =====
country ======
Mobile number=======

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2015-08-30  2:16 jerryfunds5
  0 siblings, 0 replies; 211+ messages in thread
From: jerryfunds5 @ 2015-08-30  2:16 UTC (permalink / raw)
  To: Recipients

We Give Out Loans For 3% Interest Rate And We Offer Loans From $5,000 To $50,000,000.00, Are You Looking To Buy A House Car Or Company Or Start Up A Truck Company or Buy A Truck Or Personal Loans, Email Us At j.funds2000000@inbox.lv  With Amount Needed And Phone Number.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2015-08-29 18:38 jerryfunds23
  0 siblings, 0 replies; 211+ messages in thread
From: jerryfunds23 @ 2015-08-29 18:38 UTC (permalink / raw)
  To: Recipients

We Give Out Loans For 3% Interest Rate And We Offer Loans From $5,000 To $50,000,000.00, Are You Looking To Buy A House Car Or Company Or Start Up A Truck Company or Buy A Truck Or Personal Loans, Email Us At j.funds2000000@inbox.lv  With Amount Needed And Phone Number.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2015-08-20  7:12 Mark Singer
  0 siblings, 0 replies; 211+ messages in thread
From: Mark Singer @ 2015-08-20  7:12 UTC (permalink / raw)





Do you need an investor?
Our investors fund project and business. We also give out loan/credit to any individual and company at 3% interest rate yearly. For more information, Contact us via Email: devonfps@gmail.com 

If you need an investor or quick funding, forward your response ONLY to this E-mail: devonfps@gmail.com 
....
Haben Sie einen Investor brauchen?
Unsere Investoren Fonds Projekt- und Geschäfts. Wir geben auch Darlehen / Kredite an jeden einzelnen und Unternehmen bei 3% Zinsen jährlich. Für weitere Informationen, kontaktieren Sie uns per E-Mail: devonfps@gmail.com 

Wenn Sie ein Investor oder schnelle Finanzierung benötigen, senden Sie Ihre Antwort nur auf diese E-mail: devonfps@gmail.com --
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2015-07-01 11:53 Sasnett_Karen
  0 siblings, 0 replies; 211+ messages in thread
From: Sasnett_Karen @ 2015-07-01 11:53 UTC (permalink / raw)





Haben Sie einen Investor brauchen?

Haben Sie geschäftliche oder persönliche Darlehen benötigen?

Wir geben Darlehen an eine natürliche Person und Unternehmen bei 3% Zinsen jährlich. Weitere Informationen Kontaktieren Sie uns per E-Mail: omfcreditspa@hotmail.com<mailto:omfcreditspa@hotmail.com>



HINWEIS: Leiten Sie Ihre Antwort nur an diese E-Mail: omfcreditspa@hotmail.com<mailto:omfcreditspa@hotmail.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2015-04-24  7:01 Amir A.
  0 siblings, 0 replies; 211+ messages in thread
From: Amir A. @ 2015-04-24  7:01 UTC (permalink / raw)





--
THANKS FOR YOUR LAST MAIL
THE INFORMATION NEEDED ARE
NAME
PHONE
FAX
ADDRESS
OCCUPATION
AGE
to enable my further discussion with you.
Best regards and wishes to you all.
Amir A. Khanmammadov
REPLY TO
amir2016@vera.com.uy
--


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
       [not found]                                                                                                     ` <1480763910.146593.1414958012342.JavaMail.yahoo@jws10033.mail.ne1.yahoo.com>
@ 2014-11-02 19:54                                                                                                       ` MRS GRACE MANDA
  0 siblings, 0 replies; 211+ messages in thread
From: MRS GRACE MANDA @ 2014-11-02 19:54 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 71 bytes --]









This is Mrs Grace Manda (  Please I need your Help is Urgent). 

[-- Attachment #2: Mrs Grace Manda.rtf --]
[-- Type: application/rtf, Size: 35796 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2014-10-15 15:01 Steve French
  0 siblings, 0 replies; 211+ messages in thread
From: Steve French @ 2014-10-15 15:01 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel

smb3 also fails new xfstest generic/035 (as does nfs but for different
reasons) although cifs works.

Looks like need to implement a rename_pending_delete worker function
for smb2/smb2.1/smb3 (as cifs has).

-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2014-09-18 14:15 Maria Caballero
  0 siblings, 0 replies; 211+ messages in thread
From: Maria Caballero @ 2014-09-18 14:15 UTC (permalink / raw)



Loan Offer contact us for  more details (gibonline11@gmail.com<mailto:gibonline11@gmail.com>)
All Details should be forward to this E-mail address for fast respond: gibonline11@gmail.com<mailto:gibonline11@gmail.com>

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2014-03-23 13:48 Fiser, Sarah A.
  0 siblings, 0 replies; 211+ messages in thread
From: Fiser, Sarah A. @ 2014-03-23 13:48 UTC (permalink / raw)



Fast and urgent funding for you, if interested, contact us via: bevloanservicess@webadicta.org<mailto:bevloanservicess@webadicta.org>
============================================================================================
schnelle und dringende Finanzierung für Sie, bei Interesse, kontaktieren Sie uns per E-Mail: bevloanservicess@webadicta.org<mailto:bevloanservicess@webadicta.org>

________________________________
The information contained in this e-mail message is intended solely for
the recipient(s) and may contain privileged information. Tampering with
or altering the contents of this message is prohibited. This information
is the same as any written document and may be subject to all rules
governing public information according to Florida Statutes. Any message
that falls under Chapter 119 shall not be altered in a manner that
misrepresents the activities of Orange County Public Schools.

[References: Florida State Constitution I.24, Florida State Statutes
Chapter 119, and OCPS Management Directive A-9.] If you have received
this message in error, or are not the named recipient notify the sender
and delete this message from your computer.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2014-02-01 12:05 Raymond Singh
  0 siblings, 0 replies; 211+ messages in thread
From: Raymond Singh @ 2014-02-01 12:05 UTC (permalink / raw)




We are now providing business & personal loans:
 
At superb rates- Starting from 2.0%
Flexible Repayment period- 2 to 30 Years.
For more information and Application, Please reply.

To unsubscribe please reply with "unsubscribe" as subject

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2013-11-25 15:59 Steve French
  0 siblings, 0 replies; 211+ messages in thread
From: Steve French @ 2013-11-25 15:59 UTC (permalink / raw)
  To: samba-technical, linux-cifs, linux-fsdevel; +Cc: David Disseldorp

[-- Attachment #1: Type: text/plain, Size: 2675 bytes --]

>From f19e84df37bda502a2248d507a9cf2b9e693279e Mon Sep 17 00:00:00 2001
From: Steve French <smfrench@gmail.com>
Date: Sun, 24 Nov 2013 21:53:17 -0600
Subject: [PATCH] [CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload

Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead
of BTRFS_IOC_CLONE to avoid confusion about whether
copy-on-write is required or optional for this operation.

SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since
they both speed up copy by offloading the copy rather than
passing many read and write requests back and forth and both have
identical syntax (passing file handles), but for SMB2/SMB3
CopyChunk the server is not required to use copy-on-write
to make a copy of the file (although some do), and Christoph
has commented that since CopyChunk does not require
copy-on-write we should not reuse BTRFS_IOC_CLONE.

This patch renames the ioctl to use a cifs specific IOCTL
CIFS_IOCTL_COPYCHUNK.  This ioctl is particularly important
for SMB2/SMB3 since large file copy over the network otherwise
can be very slow, and with this is often more than 100 times
faster putting less load on server and client.

Note that if a copy syscall is ever introduced, depending on
its requirements/format it could end up using one of the other
three methods that CIFS/SMB2/SMB3 protocol allows for copy offload,
but this method is particularly useful for file copy
and broadly supported (not just by Samba server).

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Disseldorp <ddiss@samba.org>
---
 fs/cifs/ioctl.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
index 409b45e..7749230 100644
--- a/fs/cifs/ioctl.c
+++ b/fs/cifs/ioctl.c
@@ -26,13 +26,15 @@
 #include <linux/mount.h>
 #include <linux/mm.h>
 #include <linux/pagemap.h>
-#include <linux/btrfs.h>
 #include "cifspdu.h"
 #include "cifsglob.h"
 #include "cifsproto.h"
 #include "cifs_debug.h"
 #include "cifsfs.h"

+#define CIFS_IOCTL_MAGIC    0xCF
+#define CIFS_IOC_COPYCHUNK_FILE    _IOW(CIFS_IOCTL_MAGIC, 3, int)
+
 static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
             unsigned long srcfd, u64 off, u64 len, u64 destoff)
 {
@@ -213,7 +215,7 @@ long cifs_ioctl(struct file *filep, unsigned int
command, unsigned long arg)
                 cifs_dbg(FYI, "set compress flag rc %d\n", rc);
             }
             break;
-        case BTRFS_IOC_CLONE:
+        case CIFS_IOC_COPYCHUNK_FILE:
             rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0);
             break;
         default:
-- 
1.8.3.1


-- 
Thanks,

Steve

[-- Attachment #2: 0001-CIFS-Do-not-use-btrfs-refcopy-ioctl-for-SMB2-copy-of.patch --]
[-- Type: text/x-patch, Size: 2565 bytes --]

From f19e84df37bda502a2248d507a9cf2b9e693279e Mon Sep 17 00:00:00 2001
From: Steve French <smfrench@gmail.com>
Date: Sun, 24 Nov 2013 21:53:17 -0600
Subject: [PATCH] [CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload

Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead
of BTRFS_IOC_CLONE to avoid confusion about whether
copy-on-write is required or optional for this operation.

SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since
they both speed up copy by offloading the copy rather than
passing many read and write requests back and forth and both have
identical syntax (passing file handles), but for SMB2/SMB3
CopyChunk the server is not required to use copy-on-write
to make a copy of the file (although some do), and Christoph
has commented that since CopyChunk does not require
copy-on-write we should not reuse BTRFS_IOC_CLONE.

This patch renames the ioctl to use a cifs specific IOCTL
CIFS_IOCTL_COPYCHUNK.  This ioctl is particularly important
for SMB2/SMB3 since large file copy over the network otherwise
can be very slow, and with this is often more than 100 times
faster putting less load on server and client.

Note that if a copy syscall is ever introduced, depending on
its requirements/format it could end up using one of the other
three methods that CIFS/SMB2/SMB3 can do for copy offload,
but this method is particularly useful for file copy
and broadly supported (not just by Samba server).

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Disseldorp <ddiss@samba.org>
---
 fs/cifs/ioctl.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
index 409b45e..7749230 100644
--- a/fs/cifs/ioctl.c
+++ b/fs/cifs/ioctl.c
@@ -26,13 +26,15 @@
 #include <linux/mount.h>
 #include <linux/mm.h>
 #include <linux/pagemap.h>
-#include <linux/btrfs.h>
 #include "cifspdu.h"
 #include "cifsglob.h"
 #include "cifsproto.h"
 #include "cifs_debug.h"
 #include "cifsfs.h"
 
+#define CIFS_IOCTL_MAGIC	0xCF
+#define CIFS_IOC_COPYCHUNK_FILE	_IOW(CIFS_IOCTL_MAGIC, 3, int)
+
 static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
 			unsigned long srcfd, u64 off, u64 len, u64 destoff)
 {
@@ -213,7 +215,7 @@ long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
 				cifs_dbg(FYI, "set compress flag rc %d\n", rc);
 			}
 			break;
-		case BTRFS_IOC_CLONE:
+		case CIFS_IOC_COPYCHUNK_FILE:
 			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0);
 			break;
 		default:
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2013-10-17 20:35 Steve French
  0 siblings, 0 replies; 211+ messages in thread
From: Steve French @ 2013-10-17 20:35 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel,
	samba-technical, David Disseldorp

[-- Attachment #1: Type: text/plain, Size: 10793 bytes --]

>From 6b6503530681165dccf2ce59eb631542ec58288c Mon Sep 17 00:00:00 2001
From: Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Thu, 17 Oct 2013 14:16:33 -0500
Subject: [PATCH] [CIFS] SMB2/SMB3 Copy offload support (refcopy) phase 1

This first patch adds the ability for us to do a server side copy
(ie fast copy offloaded to the server)

"cp --reflink"

of one file to another located on the same server.  This
is much faster than traditional copy (which requires
reading and writing over the network and extra
memcpys).

This first version is not going to copy
files larger than about 1MB (to Samba) until I add
support for multiple chunks and for autoconfiguring
the chunksize.  To work to Samba it requires Samba 4.1 or later and
David Disseldorp's recently posted small Samba server patch.
It does work to Windows.

It includes:
1) processing of the ioctl (IOC_CLONE, similar to btrfs)
2) marshalling and sending the SMB2/SMB3 fsctl over the network
3) simple parsing of the response

It does not include yet (these will be in followon patches to come soon):
1) support for multiple chunks
2) support for autoconfiguring and remembering the chunksize
3) Support for the older style copychunk which Samba 4.1 server supports
(because this would require read permission on the target file, which
cp does not give you, apparently per-posix).  Use of COPYCHUNK to
Samba 4.1 server (pre-david's patch) may require
a distinct tool (other than cp) and another (trivial) ioctl to implement.

Signed-off-by: Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 fs/cifs/cifsglob.h |   3 ++
 fs/cifs/ioctl.c    | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/smb2ops.c  |  82 ++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/smb2pdu.h  |  15 +++++++-
 4 files changed, 202 insertions(+), 1 deletion(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index de3e3e0..a67cf12 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -381,6 +381,9 @@ struct smb_version_operations {
  char * (*create_lease_buf)(u8 *, u8);
  /* parse lease context buffer and return oplock/epoch info */
  __u8 (*parse_lease_buf)(void *, unsigned int *);
+ int (*clone_range)(const unsigned int, struct cifsFileInfo *src_file,
+ struct cifsFileInfo *target_file, u64 src_off, u64 len,
+ u64 dest_off);
 };

 struct smb_version_values {
diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
index ba54bf6..d353f6c 100644
--- a/fs/cifs/ioctl.c
+++ b/fs/cifs/ioctl.c
@@ -22,12 +22,112 @@
  */

 #include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/mount.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/btrfs.h>
 #include "cifspdu.h"
 #include "cifsglob.h"
 #include "cifsproto.h"
 #include "cifs_debug.h"
 #include "cifsfs.h"

+static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
+ unsigned long srcfd, u64 off, u64 len, u64 destoff)
+{
+ int rc;
+ struct cifsFileInfo *smb_file_target = dst_file->private_data;
+ struct inode *target_inode = file_inode(dst_file);
+ struct cifs_tcon *target_tcon;
+ struct fd src_file;
+ struct cifsFileInfo *smb_file_src;
+ struct inode *src_inode;
+ struct cifs_tcon *src_tcon;
+
+ cifs_dbg(FYI, "ioctl clone range\n");
+ /* the destination must be opened for writing */
+ if (!(dst_file->f_mode & FMODE_WRITE)) {
+ cifs_dbg(FYI, "file target not open for write\n");
+ return -EINVAL;
+ }
+
+ /* check if target volume is readonly and take reference */
+ rc = mnt_want_write_file(dst_file);
+ if (rc) {
+ cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
+ return rc;
+ }
+
+ src_file = fdget(srcfd);
+ if (!src_file.file) {
+ rc = -EBADF;
+ goto out_drop_write;
+ }
+
+ if ((!src_file.file->private_data) || (!dst_file->private_data)) {
+ rc = -EBADF;
+ cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
+ goto out_fput;
+ }
+
+ rc = -EXDEV;
+ smb_file_target = dst_file->private_data;
+ smb_file_src = src_file.file->private_data;
+ src_tcon = tlink_tcon(smb_file_src->tlink);
+ target_tcon = tlink_tcon(smb_file_target->tlink);
+
+ /* check if source and target are on same tree connection */
+ if (src_tcon != target_tcon) {
+ cifs_dbg(VFS, "file copy src and target on different volume\n");
+ goto out_fput;
+ }
+
+ src_inode = src_file.file->f_dentry->d_inode;
+
+ /* Note: cifs case is easier than btrfs since server responsible for */
+ /* checks for proper open modes and file type and if it wants */
+ /* server could even support copy of range where source = target */
+
+ /* so we do not deadlock racing two ioctls on same files */
+ /* btrfs does a similar check */
+ if (target_inode < src_inode) {
+ mutex_lock_nested(&target_inode->i_mutex, I_MUTEX_PARENT);
+ mutex_lock_nested(&src_inode->i_mutex, I_MUTEX_CHILD);
+ } else {
+ mutex_lock_nested(&src_inode->i_mutex, I_MUTEX_PARENT);
+ mutex_lock_nested(&target_inode->i_mutex, I_MUTEX_CHILD);
+ }
+
+ /* determine range to clone */
+ rc = -EINVAL;
+ if (off + len > src_inode->i_size || off + len < off)
+ goto out_unlock;
+ if (len == 0)
+ len = src_inode->i_size - off;
+
+ cifs_dbg(FYI, "about to flush pages\n");
+ /* should we flush first and last page first */
+ truncate_inode_pages_range(&target_inode->i_data, destoff,
+   PAGE_CACHE_ALIGN(destoff + len)-1);
+
+ if (target_tcon->ses->server->ops->clone_range)
+ rc = target_tcon->ses->server->ops->clone_range(xid,
+ smb_file_src, smb_file_target, off, len, destoff);
+
+ /* force revalidate of size and timestamps of target file now
+   that target is updated on the server */
+ CIFS_I(target_inode)->time = 0;
+out_unlock:
+ mutex_unlock(&src_inode->i_mutex);
+ mutex_unlock(&target_inode->i_mutex);
+out_fput:
+ fdput(src_file);
+out_drop_write:
+ mnt_drop_write_file(dst_file);
+ return rc;
+}
+
 long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
 {
  struct inode *inode = file_inode(filep);
@@ -105,6 +205,9 @@ long cifs_ioctl(struct file *filep, unsigned int
command, unsigned long arg)
  cifs_dbg(FYI, "set compress flag rc %d\n", rc);
  }
  break;
+ case BTRFS_IOC_CLONE:
+ rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0);
+ break;
  default:
  cifs_dbg(FYI, "unsupported ioctl\n");
  break;
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index c571be8..11dde4b 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -494,6 +494,85 @@ smb2_close_file(const unsigned int xid, struct
cifs_tcon *tcon,
 }

 static int
+SMB2_request_res_key(const unsigned int xid, struct cifs_tcon *tcon,
+     u64 persistent_fid, u64 volatile_fid,
+     struct copychunk_ioctl *pcchunk)
+{
+ int rc;
+ unsigned int ret_data_len;
+ struct resume_key_req *res_key;
+
+ rc = SMB2_ioctl(xid, tcon, persistent_fid, volatile_fid,
+ FSCTL_SRV_REQUEST_RESUME_KEY, true /* is_fsctl */,
+ NULL, 0 /* no input */,
+ (char **)&res_key, &ret_data_len);
+
+ if (rc) {
+ cifs_dbg(VFS, "refcpy ioctl error %d getting resume key\n", rc);
+ goto req_res_key_exit;
+ }
+ if (ret_data_len < sizeof(struct resume_key_req)) {
+ cifs_dbg(VFS, "Invalid refcopy resume key length\n");
+ rc = -EINVAL;
+ goto req_res_key_exit;
+ }
+ memcpy(pcchunk->SourceKey, res_key->ResumeKey, COPY_CHUNK_RES_KEY_SIZE);
+
+req_res_key_exit:
+ kfree(res_key);
+ return rc;
+}
+
+static int
+smb2_clone_range(const unsigned int xid,
+ struct cifsFileInfo *srcfile,
+ struct cifsFileInfo *trgtfile, u64 src_off,
+ u64 len, u64 dest_off)
+{
+ int rc;
+ unsigned int ret_data_len;
+ struct copychunk_ioctl *pcchunk;
+ char *retbuf = NULL;
+
+ pcchunk = kmalloc(sizeof(struct copychunk_ioctl), GFP_KERNEL);
+
+ if (pcchunk == NULL)
+ return -ENOMEM;
+
+ cifs_dbg(FYI, "in smb2_clone_range - about to call request res key\n");
+ /* Request a key from the server to identify the source of the copy */
+ rc = SMB2_request_res_key(xid, tlink_tcon(srcfile->tlink),
+ srcfile->fid.persistent_fid,
+ srcfile->fid.volatile_fid, pcchunk);
+
+ /* Note: request_res_key sets res_key null only if rc !=0 */
+ if (rc)
+ return rc;
+
+ /* For now array only one chunk long, will make more flexible later */
+ pcchunk->ChunkCount = __constant_cpu_to_le32(1);
+ pcchunk->Reserved = 0;
+ pcchunk->SourceOffset = cpu_to_le64(src_off);
+ pcchunk->TargetOffset = cpu_to_le64(dest_off);
+ pcchunk->Length = cpu_to_le32(len);
+ pcchunk->Reserved2 = 0;
+
+ /* Request that server copy to target from src file identified by key */
+ rc = SMB2_ioctl(xid, tlink_tcon(trgtfile->tlink),
+ trgtfile->fid.persistent_fid,
+ trgtfile->fid.volatile_fid, FSCTL_SRV_COPYCHUNK_WRITE,
+ true /* is_fsctl */, (char *)pcchunk,
+ sizeof(struct copychunk_ioctl), &retbuf, &ret_data_len);
+
+ /* BB need to special case rc = EINVAL to alter chunk size */
+
+ cifs_dbg(FYI, "rc %d data length out %d\n", rc, ret_data_len);
+
+ kfree(pcchunk);
+ return rc;
+}
+
+static int
 smb2_flush_file(const unsigned int xid, struct cifs_tcon *tcon,
  struct cifs_fid *fid)
 {
@@ -1017,6 +1096,7 @@ struct smb_version_operations smb20_operations = {
  .set_oplock_level = smb2_set_oplock_level,
  .create_lease_buf = smb2_create_lease_buf,
  .parse_lease_buf = smb2_parse_lease_buf,
+ .clone_range = smb2_clone_range,
 };

 struct smb_version_operations smb21_operations = {
@@ -1090,6 +1170,7 @@ struct smb_version_operations smb21_operations = {
  .set_oplock_level = smb21_set_oplock_level,
  .create_lease_buf = smb2_create_lease_buf,
  .parse_lease_buf = smb2_parse_lease_buf,
+ .clone_range = smb2_clone_range,
 };

 struct smb_version_operations smb30_operations = {
@@ -1165,6 +1246,7 @@ struct smb_version_operations smb30_operations = {
  .set_oplock_level = smb3_set_oplock_level,
  .create_lease_buf = smb3_create_lease_buf,
  .parse_lease_buf = smb3_parse_lease_buf,
+ .clone_range = smb2_clone_range,
 };

 struct smb_version_values smb20_values = {
diff --git a/fs/cifs/smb2pdu.h b/fs/cifs/smb2pdu.h
index 6183b1b..b50a129 100644
--- a/fs/cifs/smb2pdu.h
+++ b/fs/cifs/smb2pdu.h
@@ -534,9 +534,16 @@ struct create_durable {
  } Data;
 } __packed;

+#define COPY_CHUNK_RES_KEY_SIZE 24
+struct resume_key_req {
+ char ResumeKey[COPY_CHUNK_RES_KEY_SIZE];
+ __le32 ContextLength; /* MBZ */
+ char Context[0]; /* ignored, Windows sets to 4 bytes of zero */
+} __packed;
+
 /* this goes in the ioctl buffer when doing a copychunk request */
 struct copychunk_ioctl {
- char SourceKey[24];
+ char SourceKey[COPY_CHUNK_RES_KEY_SIZE];
  __le32 ChunkCount; /* we are only sending 1 */
  __le32 Reserved;
  /* array will only be one chunk long for us */
@@ -546,6 +553,12 @@ struct copychunk_ioctl {
  __u32 Reserved2;
 } __packed;

+struct copychunk_ioctl_rsp {
+ __le32 ChunksWritten;
+ __le32 ChunkBytesWritten;
+ __le32 TotalBytesWritten;
+} __packed;
+
 /* Response and Request are the same format */
 struct validate_negotiate_info {
  __le32 Capabilities;
-- 
1.7.11.7



-- 
Thanks,

Steve

[-- Attachment #2: 0001-CIFS-SMB2-SMB3-Copy-offload-support-refcopy-phase-1.patch --]
[-- Type: application/octet-stream, Size: 10596 bytes --]

From 6b6503530681165dccf2ce59eb631542ec58288c Mon Sep 17 00:00:00 2001
From: Steve French <smfrench@gmail.com>
Date: Thu, 17 Oct 2013 14:16:33 -0500
Subject: [PATCH] [CIFS] SMB2/SMB3 Copy offload support (refcopy) phase 1

This first patch adds the ability for us to do a server side copy
(ie fast copy offloaded to the server to perform, aka refcopy)

"cp --reflink"

of one file to another located on the same server.  This
is much faster than traditional copy (which requires
reading and writing over the network and extra
memcpys).

This first version is not going to be copy
files larger than about 1MB (to Samba) until I add
support for multiple chunks and for autoconfiguring
the chunksize.

It includes:
1) processing of the ioctl
2) marshalling and sending the SMB2/SMB3 fsctl over the network
3) simple parsing of the response

It does not include yet (these will be in followon patches to come soon):
1) support for multiple chunks
2) support for autoconfiguring and remembering the chunksize
3) Support for the older style copychunk which Samba 4.1 server supports
(because this requires write permission on the target file, which
cp does not give you, apparently per-posix).  This may require
a distinct tool (other than cp) and other ioctl to implement.

Signed-off-by: Steve French <smfrench@gmail.com>
---
 fs/cifs/cifsglob.h |   3 ++
 fs/cifs/ioctl.c    | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/smb2ops.c  |  82 ++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/smb2pdu.h  |  15 +++++++-
 4 files changed, 202 insertions(+), 1 deletion(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index de3e3e0..a67cf12 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -381,6 +381,9 @@ struct smb_version_operations {
 	char * (*create_lease_buf)(u8 *, u8);
 	/* parse lease context buffer and return oplock/epoch info */
 	__u8 (*parse_lease_buf)(void *, unsigned int *);
+	int (*clone_range)(const unsigned int, struct cifsFileInfo *src_file,
+			struct cifsFileInfo *target_file, u64 src_off, u64 len,
+			u64 dest_off);
 };
 
 struct smb_version_values {
diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
index ba54bf6..d353f6c 100644
--- a/fs/cifs/ioctl.c
+++ b/fs/cifs/ioctl.c
@@ -22,12 +22,112 @@
  */
 
 #include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/mount.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/btrfs.h>
 #include "cifspdu.h"
 #include "cifsglob.h"
 #include "cifsproto.h"
 #include "cifs_debug.h"
 #include "cifsfs.h"
 
+static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
+			unsigned long srcfd, u64 off, u64 len, u64 destoff)
+{
+	int rc;
+	struct cifsFileInfo *smb_file_target = dst_file->private_data;
+	struct inode *target_inode = file_inode(dst_file);
+	struct cifs_tcon *target_tcon;
+	struct fd src_file;
+	struct cifsFileInfo *smb_file_src;
+	struct inode *src_inode;
+	struct cifs_tcon *src_tcon;
+
+	cifs_dbg(FYI, "ioctl clone range\n");
+	/* the destination must be opened for writing */
+	if (!(dst_file->f_mode & FMODE_WRITE)) {
+		cifs_dbg(FYI, "file target not open for write\n");
+		return -EINVAL;
+	}
+
+	/* check if target volume is readonly and take reference */
+	rc = mnt_want_write_file(dst_file);
+	if (rc) {
+		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
+		return rc;
+	}
+
+	src_file = fdget(srcfd);
+	if (!src_file.file) {
+		rc = -EBADF;
+		goto out_drop_write;
+	}
+
+	if ((!src_file.file->private_data) || (!dst_file->private_data)) {
+		rc = -EBADF;
+		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
+		goto out_fput;
+	}
+
+	rc = -EXDEV;
+	smb_file_target = dst_file->private_data;
+	smb_file_src = src_file.file->private_data;
+	src_tcon = tlink_tcon(smb_file_src->tlink);
+	target_tcon = tlink_tcon(smb_file_target->tlink);
+
+	/* check if source and target are on same tree connection */
+	if (src_tcon != target_tcon) {
+		cifs_dbg(VFS, "file copy src and target on different volume\n");
+		goto out_fput;
+	}
+
+	src_inode = src_file.file->f_dentry->d_inode;
+
+	/* Note: cifs case is easier than btrfs since server responsible for */
+	/* checks for proper open modes and file type and if it wants */
+	/* server could even support copy of range where source = target */
+
+	/* so we do not deadlock racing two ioctls on same files */
+	/* btrfs does a similar check */
+	if (target_inode < src_inode) {
+		mutex_lock_nested(&target_inode->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&src_inode->i_mutex, I_MUTEX_CHILD);
+	} else {
+		mutex_lock_nested(&src_inode->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&target_inode->i_mutex, I_MUTEX_CHILD);
+	}
+
+	/* determine range to clone */
+	rc = -EINVAL;
+	if (off + len > src_inode->i_size || off + len < off)
+		goto out_unlock;
+	if (len == 0)
+		len = src_inode->i_size - off;
+
+	cifs_dbg(FYI, "about to flush pages\n");
+	/* should we flush first and last page first */
+	truncate_inode_pages_range(&target_inode->i_data, destoff,
+				   PAGE_CACHE_ALIGN(destoff + len)-1);
+
+	if (target_tcon->ses->server->ops->clone_range)
+		rc = target_tcon->ses->server->ops->clone_range(xid,
+			smb_file_src, smb_file_target, off, len, destoff);
+
+	/* force revalidate of size and timestamps of target file now
+	   that target is updated on the server */
+	CIFS_I(target_inode)->time = 0;
+out_unlock:
+	mutex_unlock(&src_inode->i_mutex);
+	mutex_unlock(&target_inode->i_mutex);
+out_fput:
+	fdput(src_file);
+out_drop_write:
+	mnt_drop_write_file(dst_file);
+	return rc;
+}
+
 long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
 {
 	struct inode *inode = file_inode(filep);
@@ -105,6 +205,9 @@ long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
 				cifs_dbg(FYI, "set compress flag rc %d\n", rc);
 			}
 			break;
+		case BTRFS_IOC_CLONE:
+			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0);
+			break;
 		default:
 			cifs_dbg(FYI, "unsupported ioctl\n");
 			break;
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index c571be8..11dde4b 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -494,6 +494,85 @@ smb2_close_file(const unsigned int xid, struct cifs_tcon *tcon,
 }
 
 static int
+SMB2_request_res_key(const unsigned int xid, struct cifs_tcon *tcon,
+		     u64 persistent_fid, u64 volatile_fid,
+		     struct copychunk_ioctl *pcchunk)
+{
+	int rc;
+	unsigned int ret_data_len;
+	struct resume_key_req *res_key;
+
+	rc = SMB2_ioctl(xid, tcon, persistent_fid, volatile_fid,
+			FSCTL_SRV_REQUEST_RESUME_KEY, true /* is_fsctl */,
+			NULL, 0 /* no input */,
+			(char **)&res_key, &ret_data_len);
+
+	if (rc) {
+		cifs_dbg(VFS, "refcpy ioctl error %d getting resume key\n", rc);
+		goto req_res_key_exit;
+	}
+	if (ret_data_len < sizeof(struct resume_key_req)) {
+		cifs_dbg(VFS, "Invalid refcopy resume key length\n");
+		rc = -EINVAL;
+		goto req_res_key_exit;
+	}
+	memcpy(pcchunk->SourceKey, res_key->ResumeKey, COPY_CHUNK_RES_KEY_SIZE);
+
+req_res_key_exit:
+	kfree(res_key);
+	return rc;
+}
+
+static int
+smb2_clone_range(const unsigned int xid,
+			struct cifsFileInfo *srcfile,
+			struct cifsFileInfo *trgtfile, u64 src_off,
+			u64 len, u64 dest_off)
+{
+	int rc;
+	unsigned int ret_data_len;
+	struct copychunk_ioctl *pcchunk;
+	char *retbuf = NULL;
+
+	pcchunk = kmalloc(sizeof(struct copychunk_ioctl), GFP_KERNEL);
+
+	if (pcchunk == NULL)
+		return -ENOMEM;
+
+	cifs_dbg(FYI, "in smb2_clone_range - about to call request res key\n");
+	/* Request a key from the server to identify the source of the copy */
+	rc = SMB2_request_res_key(xid, tlink_tcon(srcfile->tlink),
+				srcfile->fid.persistent_fid,
+				srcfile->fid.volatile_fid, pcchunk);
+
+	/* Note: request_res_key sets res_key null only if rc !=0 */
+	if (rc)
+		return rc;
+
+	/* For now array only one chunk long, will make more flexible later */
+	pcchunk->ChunkCount = __constant_cpu_to_le32(1);
+	pcchunk->Reserved = 0;
+	pcchunk->SourceOffset = cpu_to_le64(src_off);
+	pcchunk->TargetOffset = cpu_to_le64(dest_off);
+	pcchunk->Length = cpu_to_le32(len);
+	pcchunk->Reserved2 = 0;
+
+	/* Request that server copy to target from src file identified by key */
+	rc = SMB2_ioctl(xid, tlink_tcon(trgtfile->tlink),
+			trgtfile->fid.persistent_fid,
+			trgtfile->fid.volatile_fid, FSCTL_SRV_COPYCHUNK_WRITE,
+			true /* is_fsctl */, (char *)pcchunk,
+			sizeof(struct copychunk_ioctl),	&retbuf, &ret_data_len);
+
+	/* BB need to special case rc = EINVAL to alter chunk size */
+
+	cifs_dbg(FYI, "rc %d data length out %d\n", rc, ret_data_len);
+
+	kfree(pcchunk);
+	return rc;
+}
+
+static int
 smb2_flush_file(const unsigned int xid, struct cifs_tcon *tcon,
 		struct cifs_fid *fid)
 {
@@ -1017,6 +1096,7 @@ struct smb_version_operations smb20_operations = {
 	.set_oplock_level = smb2_set_oplock_level,
 	.create_lease_buf = smb2_create_lease_buf,
 	.parse_lease_buf = smb2_parse_lease_buf,
+	.clone_range = smb2_clone_range,
 };
 
 struct smb_version_operations smb21_operations = {
@@ -1090,6 +1170,7 @@ struct smb_version_operations smb21_operations = {
 	.set_oplock_level = smb21_set_oplock_level,
 	.create_lease_buf = smb2_create_lease_buf,
 	.parse_lease_buf = smb2_parse_lease_buf,
+	.clone_range = smb2_clone_range,
 };
 
 struct smb_version_operations smb30_operations = {
@@ -1165,6 +1246,7 @@ struct smb_version_operations smb30_operations = {
 	.set_oplock_level = smb3_set_oplock_level,
 	.create_lease_buf = smb3_create_lease_buf,
 	.parse_lease_buf = smb3_parse_lease_buf,
+	.clone_range = smb2_clone_range,
 };
 
 struct smb_version_values smb20_values = {
diff --git a/fs/cifs/smb2pdu.h b/fs/cifs/smb2pdu.h
index 6183b1b..b50a129 100644
--- a/fs/cifs/smb2pdu.h
+++ b/fs/cifs/smb2pdu.h
@@ -534,9 +534,16 @@ struct create_durable {
 	} Data;
 } __packed;
 
+#define COPY_CHUNK_RES_KEY_SIZE	24
+struct resume_key_req {
+	char ResumeKey[COPY_CHUNK_RES_KEY_SIZE];
+	__le32	ContextLength;	/* MBZ */
+	char	Context[0];	/* ignored, Windows sets to 4 bytes of zero */
+} __packed;
+
 /* this goes in the ioctl buffer when doing a copychunk request */
 struct copychunk_ioctl {
-	char SourceKey[24];
+	char SourceKey[COPY_CHUNK_RES_KEY_SIZE];
 	__le32 ChunkCount; /* we are only sending 1 */
 	__le32 Reserved;
 	/* array will only be one chunk long for us */
@@ -546,6 +553,12 @@ struct copychunk_ioctl {
 	__u32 Reserved2;
 } __packed;
 
+struct copychunk_ioctl_rsp {
+	__le32 ChunksWritten;
+	__le32 ChunkBytesWritten;
+	__le32 TotalBytesWritten;
+} __packed;
+
 /* Response and Request are the same format */
 struct validate_negotiate_info {
 	__le32 Capabilities;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2013-10-12 20:31 Innocent Eleazu
  0 siblings, 0 replies; 211+ messages in thread
From: Innocent Eleazu @ 2013-10-12 20:31 UTC (permalink / raw)


Loan offer at 3% interest rate,contact:  beverlyloanservices@outlook.com
Note: Reply to this Email Only:  beverlyloanservices@outlook.com
==========================================================================

Darlehen Angebot bei 3% Zins, Kontakt: beverlyloanservices@outlook.com
Hinweis: Antworten Sie auf diese E-Mail nur: beverlyloanservices@outlook.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2013-07-10 10:21 PRAKASH BHALODIYA
  0 siblings, 0 replies; 211+ messages in thread
From: PRAKASH BHALODIYA @ 2013-07-10 10:21 UTC (permalink / raw)


Please i need your urgent assistance.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2013-06-10 21:05 Pervez Iqbal FMS
  0 siblings, 0 replies; 211+ messages in thread
From: Pervez Iqbal FMS @ 2013-06-10 21:05 UTC (permalink / raw)


Please i need your urgent assistance.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2013-05-21 21:51 Mrs. Theressa
  0 siblings, 0 replies; 211+ messages in thread
From: Mrs. Theressa @ 2013-05-21 21:51 UTC (permalink / raw)




Are you financially down and in need of financial assistance to settle your bills or depth and you have know were to go,if yes ,contact us &nbsp;for assistance Via Email:cmothertheressa@yahoo.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2013-05-21 21:32 Mrs. Theressa
  0 siblings, 0 replies; 211+ messages in thread
From: Mrs. Theressa @ 2013-05-21 21:32 UTC (permalink / raw)




Are you financially down and in need of financial assistance to settle your bills or depth and you have know were to go,if yes ,contact us &nbsp;for assistance Via Email:cmothertheressa@yahoo.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2013-05-21 21:31 Mrs. Theressa
  0 siblings, 0 replies; 211+ messages in thread
From: Mrs. Theressa @ 2013-05-21 21:31 UTC (permalink / raw)




Are you financially down and in need of financial assistance to settle your bills or depth and you have know were to go,if yes ,contact us &nbsp;for assistance Via Email:cmothertheressa@yahoo.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2013-02-09 20:10 CAMBRIDGE LOAN COMPANY
  0 siblings, 0 replies; 211+ messages in thread
From: CAMBRIDGE LOAN COMPANY @ 2013-02-09 20:10 UTC (permalink / raw)


Do You Need A Loan at 3%? Email Amount,Country,Duration,Phone Number.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2013-02-05 17:09 CAMBRIDGE LOAN COMPANY
  0 siblings, 0 replies; 211+ messages in thread
From: CAMBRIDGE LOAN COMPANY @ 2013-02-05 17:09 UTC (permalink / raw)


Do you need a personal or business loans?If yes,email us for more info.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2013-01-26  7:53 SMITH KEN LOAN FIRM
  0 siblings, 0 replies; 211+ messages in thread
From: SMITH KEN LOAN FIRM @ 2013-01-26  7:53 UTC (permalink / raw)


DO YOU NEED LOAN @ 3% APPLY WITH AMOUNT AND DURATION

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2013-01-25 11:01 SMITH KEN LOAN FIRM
  0 siblings, 0 replies; 211+ messages in thread
From: SMITH KEN LOAN FIRM @ 2013-01-25 11:01 UTC (permalink / raw)


UNSECURED LOAN OFFER @ 3% E-MAIL US AMOUNT AND DURATION 

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2013-01-07 19:54 Financial Service Provider
  0 siblings, 0 replies; 211+ messages in thread
From: Financial Service Provider @ 2013-01-07 19:54 UTC (permalink / raw)



Loan offer at 3% apply now and get financed Today!


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2012-12-17 22:28 info
  0 siblings, 0 replies; 211+ messages in thread
From: info @ 2012-12-17 22:28 UTC (permalink / raw)




^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2012-12-04 14:23 Mr.Cooley Bruce
  0 siblings, 0 replies; 211+ messages in thread
From: Mr.Cooley Bruce @ 2012-12-04 14:23 UTC (permalink / raw)


Do you need a loan?If yes reply us now

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2012-12-03 11:39 Ernest Wilson
  0 siblings, 0 replies; 211+ messages in thread
From: Ernest Wilson @ 2012-12-03 11:39 UTC (permalink / raw)
  To: info


REF No: L/200-26937
BATCH No: 2007MJL-01


Your e-mail address have won you 750,000 GBP in Microsoft end of year raffle draw award 2012, contact this email : (ernestwilson750@zhot.net)    with your name,address,phone number and age.

Regards,
Ernest Wilson.



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2012-11-20  8:15 darrick.wong
  0 siblings, 0 replies; 211+ messages in thread
From: darrick.wong @ 2012-11-20  8:15 UTC (permalink / raw)


>From nobody Mon Nov 19 23:51:14 2012
Subject: [PATCH 4/9] xfs: honor the O_SYNC flag for aysnchronous direct I/O
 requests
To: axboe@kernel.dk, tytso@mit.edu, david@fromorbit.com, jmoyer@redhat.com,
 bpm@sgi.com, viro@zeniv.linux.org.uk, jack@suse.cz
From: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org,
 linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
 djwong+kernel@djwong.org
Date: Mon, 19 Nov 2012 23:51:14 -0800
Message-ID: <20121120075114.25270.40680.stgit@blackbox.djwong.org>
In-Reply-To: <20121120074116.24645.36369.stgit@blackbox.djwong.org>
References: <20121120074116.24645.36369.stgit@blackbox.djwong.org>
User-Agent: StGit/0.15
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

If a file is opened with O_SYNC|O_DIRECT, the drive cache does not get
flushed after the write completion for AIOs.  This patch attempts to fix
that problem by marking an I/O as requiring a cache flush in endio
processing, and then issuing the cache flush after any unwritten extent
conversion is done.

From: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
[darrick.wong@oracle.com: Rework patch to use per-mount workqueues]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c  |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_aops.h  |    1 +
 fs/xfs/xfs_mount.h |    1 +
 fs/xfs/xfs_super.c |    8 ++++++++
 4 files changed, 61 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e57e2da..9cebbb7 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -173,6 +173,24 @@ xfs_setfilesize(
 }
 
 /*
+ * In the case of synchronous, AIO, O_DIRECT writes, we need to flush
+ * the disk cache when the I/O is complete.
+ */
+STATIC bool
+xfs_ioend_needs_cache_flush(
+	struct xfs_ioend	*ioend)
+{
+	struct xfs_inode *ip = XFS_I(ioend->io_inode);
+	struct xfs_mount *mp = ip->i_mount;
+
+	if (!(mp->m_flags & XFS_MOUNT_BARRIER))
+		return false;
+
+	return IS_SYNC(ioend->io_inode) ||
+	       (ioend->io_iocb->ki_filp->f_flags & O_DSYNC);
+}
+
+/*
  * Schedule IO completion handling on the final put of an ioend.
  *
  * If there is no work to do we might as well call it a day and free the
@@ -189,11 +207,30 @@ xfs_finish_ioend(
 			queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
 		else if (ioend->io_append_trans)
 			queue_work(mp->m_data_workqueue, &ioend->io_work);
+		else if (ioend->io_needs_fsync)
+			queue_work(mp->m_aio_blkdev_flush_wq, &ioend->io_work);
 		else
 			xfs_destroy_ioend(ioend);
 	}
 }
 
+STATIC int
+xfs_ioend_force_cache_flush(
+	xfs_ioend_t	*ioend)
+{
+	struct xfs_inode *ip = XFS_I(ioend->io_inode);
+	struct xfs_mount *mp = ip->i_mount;
+	int		err = 0;
+	int		datasync;
+
+	datasync = !IS_SYNC(ioend->io_inode) &&
+		!(ioend->io_iocb->ki_filp->f_flags & __O_SYNC);
+	err = do_xfs_file_fsync(ip, mp, datasync);
+	xfs_destroy_ioend(ioend);
+	/* do_xfs_file_fsync returns -errno. our caller expects positive. */
+	return -err;
+}
+
 /*
  * IO write completion.
  */
@@ -250,12 +287,22 @@ xfs_end_io(
 		error = xfs_setfilesize(ioend);
 		if (error)
 			ioend->io_error = -error;
+	} else if (ioend->io_needs_fsync) {
+		error = xfs_ioend_force_cache_flush(ioend);
+		if (error && ioend->io_result > 0)
+			ioend->io_error = -error;
+		ioend->io_needs_fsync = 0;
 	} else {
 		ASSERT(!xfs_ioend_is_append(ioend));
 	}
 
 done:
-	xfs_destroy_ioend(ioend);
+	/* the honoring of O_SYNC has to be done last */
+	if (ioend->io_needs_fsync) {
+		atomic_inc(&ioend->io_remaining);
+		xfs_finish_ioend(ioend);
+	} else
+		xfs_destroy_ioend(ioend);
 }
 
 /*
@@ -292,6 +339,7 @@ xfs_alloc_ioend(
 	atomic_set(&ioend->io_remaining, 1);
 	ioend->io_isasync = 0;
 	ioend->io_isdirect = 0;
+	ioend->io_needs_fsync = 0;
 	ioend->io_error = 0;
 	ioend->io_list = NULL;
 	ioend->io_type = type;
@@ -1409,6 +1457,8 @@ xfs_end_io_direct_write(
 
 	if (is_async) {
 		ioend->io_isasync = 1;
+		if (xfs_ioend_needs_cache_flush(ioend))
+			ioend->io_needs_fsync = 1;
 		xfs_finish_ioend(ioend);
 	} else {
 		xfs_finish_ioend_sync(ioend);
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index c325abb..e48c7c2 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -47,6 +47,7 @@ typedef struct xfs_ioend {
 	atomic_t		io_remaining;	/* hold count */
 	unsigned int		io_isasync : 1;	/* needs aio_complete */
 	unsigned int		io_isdirect : 1;/* direct I/O */
+	unsigned int		io_needs_fsync : 1; /* aio+dio+o_sync */
 	struct inode		*io_inode;	/* file being written to */
 	struct buffer_head	*io_buffer_head;/* buffer linked list head */
 	struct buffer_head	*io_buffer_tail;/* buffer linked list tail */
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index deee09e..ecd3d2e 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -209,6 +209,7 @@ typedef struct xfs_mount {
 	struct workqueue_struct	*m_data_workqueue;
 	struct workqueue_struct	*m_unwritten_workqueue;
 	struct workqueue_struct	*m_cil_workqueue;
+	struct workqueue_struct *m_aio_blkdev_flush_wq;
 } xfs_mount_t;
 
 /*
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 26a09bd..b05b557 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -863,8 +863,15 @@ xfs_init_mount_workqueues(
 			WQ_MEM_RECLAIM, 0, mp->m_fsname);
 	if (!mp->m_cil_workqueue)
 		goto out_destroy_unwritten;
+
+	mp->m_aio_blkdev_flush_wq = alloc_workqueue("xfs-aio-blkdev-flush/%s",
+			WQ_MEM_RECLAIM, 0, mp->m_fsname);
+	if (!mp->m_aio_blkdev_flush_wq)
+		goto out_destroy_cil_queue;
 	return 0;
 
+out_destroy_cil_queue:
+	destroy_workqueue(mp->m_cil_workqueue);
 out_destroy_unwritten:
 	destroy_workqueue(mp->m_unwritten_workqueue);
 out_destroy_data_iodone_queue:
@@ -877,6 +884,7 @@ STATIC void
 xfs_destroy_mount_workqueues(
 	struct xfs_mount	*mp)
 {
+	destroy_workqueue(mp->m_aio_blkdev_flush_wq);
 	destroy_workqueue(mp->m_cil_workqueue);
 	destroy_workqueue(mp->m_data_workqueue);
 	destroy_workqueue(mp->m_unwritten_workqueue);



^ permalink raw reply related	[flat|nested] 211+ messages in thread

* (unknown)
@ 2012-11-20  8:07 darrick.wong
  0 siblings, 0 replies; 211+ messages in thread
From: darrick.wong @ 2012-11-20  8:07 UTC (permalink / raw)


>From nobody Mon Nov 19 23:51:14 2012
Subject: [PATCH 5/9] btrfs: Use generic handlers of O_SYNC AIO DIO
To: axboe@kernel.dk, tytso@mit.edu, david@fromorbit.com, jmoyer@redhat.com,
 bpm@sgi.com, viro@zeniv.linux.org.uk, jack@suse.cz
From: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org,
 linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
 djwong+kernel@djwong.org
Date: Mon, 19 Nov 2012 23:51:14 -0800
Message-ID: <20121120075114.25270.85716.stgit@blackbox.djwong.org>
In-Reply-To: <20121120074116.24645.36369.stgit@blackbox.djwong.org>
References: <20121120074116.24645.36369.stgit@blackbox.djwong.org>
User-Agent: StGit/0.15
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

Use generic handlers to queue fsync() when AIO DIO is completed for O_SYNC
file. Although we use our own bio->end_io function, we call dio_end_io()
from it and thus, because we don't set any specific dio->end_io function,
generic code ends up calling generic_dio_end_io() which is all what we need
for proper O_SYNC AIO DIO handling.

From: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
[darrick.wong@oracle.com: Don't issue flush if aio is queued]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/btrfs/file.c  |    2 +-
 fs/btrfs/inode.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9ab1bed..37b5bb3 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1495,7 +1495,7 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
 	 * one running right now.
 	 */
 	BTRFS_I(inode)->last_trans = root->fs_info->generation + 1;
-	if (num_written > 0 || num_written == -EIOCBQUEUED) {
+	if (num_written > 0) {
 		err = generic_write_sync(file, pos, num_written);
 		if (err < 0 && num_written > 0)
 			num_written = err;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 95542a1..c8b6049 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6579,7 +6579,7 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
 	return __blockdev_direct_IO(rw, iocb, inode,
 		   BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
 		   iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
-		   btrfs_submit_direct, 0);
+		   btrfs_submit_direct, DIO_SYNC_WRITES);
 }
 
 static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,



^ permalink raw reply related	[flat|nested] 211+ messages in thread

* (unknown)
@ 2012-11-20  8:05 darrick.wong
  0 siblings, 0 replies; 211+ messages in thread
From: darrick.wong @ 2012-11-20  8:05 UTC (permalink / raw)


>From nobody Mon Nov 19 23:51:14 2012
Subject: [PATCH 4/9] xfs: honor the O_SYNC flag for aysnchronous direct I/O
 requests
To: axboe@kernel.dk, tytso@mit.edu, david@fromorbit.com, jmoyer@redhat.com,
 bpm@sgi.com, viro@zeniv.linux.org.uk, jack@suse.cz
From: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org,
 linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Date: Mon, 19 Nov 2012 23:51:14 -0800
Message-ID: <20121120075114.25270.40680.stgit@blackbox.djwong.org>
In-Reply-To: <20121120074116.24645.36369.stgit@blackbox.djwong.org>
References: <20121120074116.24645.36369.stgit@blackbox.djwong.org>
User-Agent: StGit/0.15
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

If a file is opened with O_SYNC|O_DIRECT, the drive cache does not get
flushed after the write completion for AIOs.  This patch attempts to fix
that problem by marking an I/O as requiring a cache flush in endio
processing, and then issuing the cache flush after any unwritten extent
conversion is done.

From: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
[darrick.wong@oracle.com: Rework patch to use per-mount workqueues]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c  |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_aops.h  |    1 +
 fs/xfs/xfs_mount.h |    1 +
 fs/xfs/xfs_super.c |    8 ++++++++
 4 files changed, 61 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e57e2da..9cebbb7 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -173,6 +173,24 @@ xfs_setfilesize(
 }
 
 /*
+ * In the case of synchronous, AIO, O_DIRECT writes, we need to flush
+ * the disk cache when the I/O is complete.
+ */
+STATIC bool
+xfs_ioend_needs_cache_flush(
+	struct xfs_ioend	*ioend)
+{
+	struct xfs_inode *ip = XFS_I(ioend->io_inode);
+	struct xfs_mount *mp = ip->i_mount;
+
+	if (!(mp->m_flags & XFS_MOUNT_BARRIER))
+		return false;
+
+	return IS_SYNC(ioend->io_inode) ||
+	       (ioend->io_iocb->ki_filp->f_flags & O_DSYNC);
+}
+
+/*
  * Schedule IO completion handling on the final put of an ioend.
  *
  * If there is no work to do we might as well call it a day and free the
@@ -189,11 +207,30 @@ xfs_finish_ioend(
 			queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
 		else if (ioend->io_append_trans)
 			queue_work(mp->m_data_workqueue, &ioend->io_work);
+		else if (ioend->io_needs_fsync)
+			queue_work(mp->m_aio_blkdev_flush_wq, &ioend->io_work);
 		else
 			xfs_destroy_ioend(ioend);
 	}
 }
 
+STATIC int
+xfs_ioend_force_cache_flush(
+	xfs_ioend_t	*ioend)
+{
+	struct xfs_inode *ip = XFS_I(ioend->io_inode);
+	struct xfs_mount *mp = ip->i_mount;
+	int		err = 0;
+	int		datasync;
+
+	datasync = !IS_SYNC(ioend->io_inode) &&
+		!(ioend->io_iocb->ki_filp->f_flags & __O_SYNC);
+	err = do_xfs_file_fsync(ip, mp, datasync);
+	xfs_destroy_ioend(ioend);
+	/* do_xfs_file_fsync returns -errno. our caller expects positive. */
+	return -err;
+}
+
 /*
  * IO write completion.
  */
@@ -250,12 +287,22 @@ xfs_end_io(
 		error = xfs_setfilesize(ioend);
 		if (error)
 			ioend->io_error = -error;
+	} else if (ioend->io_needs_fsync) {
+		error = xfs_ioend_force_cache_flush(ioend);
+		if (error && ioend->io_result > 0)
+			ioend->io_error = -error;
+		ioend->io_needs_fsync = 0;
 	} else {
 		ASSERT(!xfs_ioend_is_append(ioend));
 	}
 
 done:
-	xfs_destroy_ioend(ioend);
+	/* the honoring of O_SYNC has to be done last */
+	if (ioend->io_needs_fsync) {
+		atomic_inc(&ioend->io_remaining);
+		xfs_finish_ioend(ioend);
+	} else
+		xfs_destroy_ioend(ioend);
 }
 
 /*
@@ -292,6 +339,7 @@ xfs_alloc_ioend(
 	atomic_set(&ioend->io_remaining, 1);
 	ioend->io_isasync = 0;
 	ioend->io_isdirect = 0;
+	ioend->io_needs_fsync = 0;
 	ioend->io_error = 0;
 	ioend->io_list = NULL;
 	ioend->io_type = type;
@@ -1409,6 +1457,8 @@ xfs_end_io_direct_write(
 
 	if (is_async) {
 		ioend->io_isasync = 1;
+		if (xfs_ioend_needs_cache_flush(ioend))
+			ioend->io_needs_fsync = 1;
 		xfs_finish_ioend(ioend);
 	} else {
 		xfs_finish_ioend_sync(ioend);
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index c325abb..e48c7c2 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -47,6 +47,7 @@ typedef struct xfs_ioend {
 	atomic_t		io_remaining;	/* hold count */
 	unsigned int		io_isasync : 1;	/* needs aio_complete */
 	unsigned int		io_isdirect : 1;/* direct I/O */
+	unsigned int		io_needs_fsync : 1; /* aio+dio+o_sync */
 	struct inode		*io_inode;	/* file being written to */
 	struct buffer_head	*io_buffer_head;/* buffer linked list head */
 	struct buffer_head	*io_buffer_tail;/* buffer linked list tail */
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index deee09e..ecd3d2e 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -209,6 +209,7 @@ typedef struct xfs_mount {
 	struct workqueue_struct	*m_data_workqueue;
 	struct workqueue_struct	*m_unwritten_workqueue;
 	struct workqueue_struct	*m_cil_workqueue;
+	struct workqueue_struct *m_aio_blkdev_flush_wq;
 } xfs_mount_t;
 
 /*
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 26a09bd..b05b557 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -863,8 +863,15 @@ xfs_init_mount_workqueues(
 			WQ_MEM_RECLAIM, 0, mp->m_fsname);
 	if (!mp->m_cil_workqueue)
 		goto out_destroy_unwritten;
+
+	mp->m_aio_blkdev_flush_wq = alloc_workqueue("xfs-aio-blkdev-flush/%s",
+			WQ_MEM_RECLAIM, 0, mp->m_fsname);
+	if (!mp->m_aio_blkdev_flush_wq)
+		goto out_destroy_cil_queue;
 	return 0;
 
+out_destroy_cil_queue:
+	destroy_workqueue(mp->m_cil_workqueue);
 out_destroy_unwritten:
 	destroy_workqueue(mp->m_unwritten_workqueue);
 out_destroy_data_iodone_queue:
@@ -877,6 +884,7 @@ STATIC void
 xfs_destroy_mount_workqueues(
 	struct xfs_mount	*mp)
 {
+	destroy_workqueue(mp->m_aio_blkdev_flush_wq);
 	destroy_workqueue(mp->m_cil_workqueue);
 	destroy_workqueue(mp->m_data_workqueue);
 	destroy_workqueue(mp->m_unwritten_workqueue);



^ permalink raw reply related	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2012-10-14  9:55 Alexey Dobriyan
  0 siblings, 0 replies; 211+ messages in thread
From: Alexey Dobriyan @ 2012-10-14  9:55 UTC (permalink / raw)
  To: linux-fsdevel, linux-joystick, linux-mips, linux-next, mletyns

  http://totalizator-online.com/wp-content/plugins/akismet/money.html

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2012-07-25  9:39 Cyrill Gorcunov
  0 siblings, 0 replies; 211+ messages in thread
From: Cyrill Gorcunov @ 2012-07-25  9:39 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2012-07-25  9:39 Cyrill Gorcunov
  0 siblings, 0 replies; 211+ messages in thread
From: Cyrill Gorcunov @ 2012-07-25  9:39 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2012-05-05 18:59 Mrs Sabah Halif
  0 siblings, 0 replies; 211+ messages in thread
From: Mrs Sabah Halif @ 2012-05-05 18:59 UTC (permalink / raw)




-- 
Good day,my name is Mrs Sabah Halif  i have a business proposal please contact me for details.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2012-04-12 11:22 monicaaluke01@gmail.com
  0 siblings, 0 replies; 211+ messages in thread
From: monicaaluke01@gmail.com @ 2012-04-12 11:22 UTC (permalink / raw)


Do you need a loan?
Вам нужен кредит?

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2012-02-17 20:28 Brian Major
  0 siblings, 0 replies; 211+ messages in thread
From: Brian Major @ 2012-02-17 20:28 UTC (permalink / raw)


I am Brian Major, I have a business proposal of ?9.8million for you,contact
back if interested.


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2012-02-15 17:47 Ann Adams
  0 siblings, 0 replies; 211+ messages in thread
From: Ann Adams @ 2012-02-15 17:47 UTC (permalink / raw)





Hi
  Sorry for the sudden contact with you via email, but please i have looked
  For you in the past few months now without any good result that is why i
  am using this medium, i would appreciate if you did contact me for a brief
  Discussion my Phone number is (+44) 703 595 6471 and email is michaelachambers@live.co.uk
    Thanks In Advance
    Michael Aiden
Note: All corresponding email should be sent to michaelachambers@live.co.uk  for an immediate attention.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2011-10-18  6:43 Benjamin Albert
  0 siblings, 0 replies; 211+ messages in thread
From: Benjamin Albert @ 2011-10-18  6:43 UTC (permalink / raw)


I am contacting you in regards to a business transfer of a huge sum of money from a deceased account. Though I know that a transaction of this magnitude will make anyone apprehensive and worried, but I am assuring you that all will be well at the end of the day. I decided to contact you due to the urgency of this transaction.please email me on Email: benalbert2011@hotmail.co.uk

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2011-08-25  1:27 con@telus.net
  0 siblings, 0 replies; 211+ messages in thread
From: con@telus.net @ 2011-08-25  1:27 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 27 bytes --]

KINDLY DOWNLOAD ATTACHMENT

[-- Attachment #2: NOTIFICATION BOARD.txt --]
[-- Type: application/octet-stream, Size: 674 bytes --]

You have been selected in the on-going COCA COLA award held this August
2011.We the Promo Board are pleased to inform you that you alongside four(4)
otherlucky winners have been approved for a payment of 1,000 000GBP (One
Million Pounds Sterling).
If you did receive this email, it means you are one of the five(5)lucky
winners.

*CLAIMS PROCESSING OFFICER:
*Name:Mr TOMMY ROGER
E.mail: claimsgroup222@qatar.io

You are also advised to provide him with the under listed information
as soon as possible:

*NAME IN FULL:
*DELIVERY ADDRESS:
*SEX:
*AGE:
*COUNTRY:
*NATIONALITY:
*OCCUPATION:
*PHONE:

We are glad to have you as one of our luckly winners.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2011-07-23  8:42 Rudi
  0 siblings, 0 replies; 211+ messages in thread
From: Rudi @ 2011-07-23  8:42 UTC (permalink / raw)
  To: linux-fsdevel



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2011-06-26  3:23 Money Gram Transfer
  0 siblings, 0 replies; 211+ messages in thread
From: Money Gram Transfer @ 2011-06-26  3:23 UTC (permalink / raw)



-- 
My associate has helped me to send your first payment
of $5000 USD to you as instructed by the Malaysian
Government and Mr. David Cameron the United Kingdom
prime minister after the last G20 meeting that was
held in Malaysia, making you one of the beneficiaries.
Here is the information below.

Refrence Numbers: 86147516
Sender Name Is = Patrick Lee Chun

I told him to keep sending you $5000 USD twice a week
until the FULL payment of ($820000.00 United State Dollars)
is completed.

A certificate will be made to change the Receivers Name
to your name as stated by the Malaysian Government,reconfirm
your {1}Full Names {2}address {3}Mobile Number

via Email to:money_gram_transfer@ozledim.net Allan Davis
to proceed.

        Note:

You cannot pickup the money until the certificate is
obtained by you.

Regards
Mr. Allan Davis.
Tel: +(60)163544376.

For more info: www.g20.org






^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2011-06-21 22:21 Ntai Jerry
  0 siblings, 0 replies; 211+ messages in thread
From: Ntai Jerry @ 2011-06-21 22:21 UTC (permalink / raw)


My name is Mr. Jerry Ntai; I am the Head of Operations in Mevas Bank, Hong
Kong. I have a business proposal in the tune of US$25.2m to be transferred
to an offshore account with your assistance if willing. After the
successful transfer, we shall share in ratio of 30% for you and 70% for
me. Should you be interested, please respond to my letter immediately, so
we can commence all arrangements and I will give you more information on
the project and how we would handle it.

You can contact me on my private email: ( j.ntai1100@gmail.com  ) and
send me the following information for documentation purpose:


(1) Full name:
(2) Private phone number:
(3) Current residential address:
(4) Occupation:
(5) Age and Sex

I look forward to hearing from you.

Kind Regards.




^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
  2011-05-03 11:01 [RFC][PATCH] Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock Surbhi Palande
@ 2011-05-03 13:08 ` Surbhi Palande
  0 siblings, 0 replies; 211+ messages in thread
From: Surbhi Palande @ 2011-05-03 13:08 UTC (permalink / raw)
  To: jack
  Cc: toshi.okajima, tytso, m.mizuma, adilger.kernel, linux-ext4,
	linux-fsdevel, sandeen


On munmap() zap_pte_range() is called which dirties the PTE dirty pages as
Toshiyuki pointed out.

zap_pte_range()
  mapping->a_ops->set_page_dirty (= ext4_journalled_set_page_dirty)  

So, I think that it is here that we should do the checking for a ext4 F.S
frozen state and also prevent a parallel ext4 F.S freeze from happening.

Attaching a patch for initial review. Please do let me know your thoughts! 

Thanks a lot!

Warm Regards,
Surbhi.



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2011-04-28  6:00 Amir Goldstein
  0 siblings, 0 replies; 211+ messages in thread
From: Amir Goldstein @ 2011-04-28  6:00 UTC (permalink / raw)
  To: linux-fsdevel

subscribe

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2011-04-16 11:30 Alexander Andrew Flockhart
  0 siblings, 0 replies; 211+ messages in thread
From: Alexander Andrew Flockhart @ 2011-04-16 11:30 UTC (permalink / raw)




Hello,
  I have a business for you to handle with me. Should you be interested, 
please contact me.
A.A. Flockhart

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2011-03-22  0:48 Sage Weil
  0 siblings, 0 replies; 211+ messages in thread
From: Sage Weil @ 2011-03-22  0:48 UTC (permalink / raw)
  To: linux-fsdevel, viro; +Cc: linux-kernel, ceph-devel

From: Sage Weil <sage@newdream.net> Date: Mon, 21 Mar 2011 15:51:04 

The Ceph client is told by the server when it has the entire contents of 
a directory in cache, and is notified prior to any changes.  However, 
the current VFS interfaces simply do not allow the fs to take advantage 
of the known-valid cached content in a non-racy way.  To do so, the fs 
needs some notification prior to dentries being dropped out of the 
dcache (e.g. due to memory pressure).  Instead, Ceph is currently forced 
to talk to the server, which is quite frustrating (and slow).

The first patch addes a new d_prune dentry_operation that is called 
before the VFS throws dentries out of cache (specifically, before the 
victim dentry is unhashed).  The next two patches make the necessary 
changes in the Ceph fs code to safely clear a D_COMPLETE flag in the 
directory dentry's d_fsdata when a child is pruned.  The third patch 
specifically compensates for calls to dentry_unhash() in vfs_rmdir() and 
vfs_rename_dir().  The last patch adjusts the Ceph fs code to take 
advantage of the new flag.  That change is pretty simple because most of 
the infrastructure is already in place (we were previously relying on 
d_release for racy notification of pruning).

Adding this interface would more or less codify the idea that the VFS 
shouldn't unhash random dentries without first calling d_prune.  There 
are currently two places where the VFS currently unhashes: vfs_rmdir and 
vfs_rename_dir both call dentry_unhash(), which is there to make it easy 
for simple file systems to avoid races with directory removal and 
lookups.  That could arguably be pushed down into those file systems, 
but it's a more delicate cleanup.

Is the d_prune d_op a reasonable VFS interface extension?  Is it 
acceptable in its current form?

Thanks!
sage


See also
  git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git d_prune


Sage Weil (4):
  vfs: add d_prune dentry operation
  ceph: clear parent D_COMPLETE flag when on dentry prune
  ceph: compensate for dentry_unhash() calls in vfs_rmdir() and
    vfs_rename_dir()
  ceph: use new D_COMPLETE dentry flag

 Documentation/filesystems/Locking |    1 +
 fs/ceph/caps.c                    |    8 +--
 fs/ceph/dir.c                     |  110 ++++++++++++++++++++++++++++++++-----
 fs/ceph/inode.c                   |    9 +--
 fs/ceph/mds_client.c              |    6 +-
 fs/ceph/super.h                   |   23 +++++++-
 fs/dcache.c                       |    8 +++
 include/linux/dcache.h            |    3 +
 8 files changed, 139 insertions(+), 29 deletions(-)



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2011-03-01 23:22 Mr Henry Henmora
  0 siblings, 0 replies; 211+ messages in thread
From: Mr Henry Henmora @ 2011-03-01 23:22 UTC (permalink / raw)


Van szüksége a hitel bármilyen célra? Van egy pénzügyi probléma? Nem
szükség van a pénzügyi megoldás? Mr. Henrik
 hitelek a megoldás toall a pénzügyi problémákat, mi hitelek könnyen,
olcsó, és gyors. Írjon nekünk ma, hogy a kölcsönt, amire vágytok, akkor
intézkedik minden olyan kölcsön, hogy megfeleljen a költségvetés
mindössze 3%-os kamat. Ha
érdekli, lépjen velünk kapcsolatba immediately.Optional Hitel A
védelem lehetővé teszi,
hogy megfeleljen a hiteltörlesztés, ha nem tud dolgozni, betegség
miatt, baleset vagy
munkanélküliség. Csak akkor vegye ki az értékes biztosítást, ha alkalmazni
az Ön kölcsönt,
emlékszem, hogy elmondja nekünk, ha azt szeretné, hogy
henmoralendingfirm@gmail.com

* HITEL JELENTKEZÉSI LAP *

* Teljes név ............*

* Otthoni cím ....................... ..*

* Születési dátum ......................*

* Telefonszám ...................*

* MOBIL szám, ha ..............*

* HITEL szükséges mennyiség .................*

* FAX .................*

* Állampolgárság ..................*

* ORSZÁG ........................*

* SZAKMA ....................*

* SEX ..................................*

* FÉRFI .............................*

* FEMAL .........................*

* VÁLÁS HA ......................*

* Legközelebbi hozzátartozó .......................*

* NÉV .......................... ...*

* Születési dátum .....................*

* CÉLJA KÖLCSÖNZÉS .......................... .......*

* A kölcsön időtartamát ........................*

* ID .......................*

* A Üdvözlettel *


* Mr. henry *
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2011-02-28 12:45 Rolande.Blondeau
  0 siblings, 0 replies; 211+ messages in thread
From: Rolande.Blondeau @ 2011-02-28 12:45 UTC (permalink / raw)




My working partner in relationship with
HSBC London has concluded that our working
partner has helped us to send you first payment of US$5,000 to you as
instructed by Malaysia government and will
keep sending you $5000 twice a week until
the payment of (US$820,000 ) is completed
within six months and here is the information


MONEY TRANSFER REFERENCE:2116-3297

SENDER'S NAME: Mike Marx
AMOUNT: US$5000
To track your funds forward money gram
Transfer agent Mr Allan Davis

Your Name.__________________________
Phone .__________________________

Contact Allan Davis for the funds clearance
certificate neccessary for the realise of your funds

E-mail:mrallan_davis1@yahoo.co.jp
D/L: Tel:+601-635-44376

Please direct all enquiring to:
money gram
Alex Rogers: Please direct all enquiring to:
dmr.allan@yahoo.com.hk 

Best Regards,
Mr Allan Davis

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2010-12-14 16:12 RED DOT COMPANY
  0 siblings, 0 replies; 211+ messages in thread
From: RED DOT COMPANY @ 2010-12-14 16:12 UTC (permalink / raw)


We are happy to notify you that your e-mail address was selected which won
you 600,000 Dollars in our on-going 2010 award presentation,

To file for the claim of this award money you are
therefore advice to reply back with your

Name....
Address....
Occupation....
Contact number.......
Age...

Upon your response you will be updated with more information on
the claim.feel free to call us on the number below.
+2348091425514

Regards
Dr. Peter Zec (Red Dot).






^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), , 
@ 2010-11-16 13:59 Ming-Yang Lee
  0 siblings, 0 replies; 211+ messages in thread
From: Ming-Yang Lee @ 2010-11-16 13:59 UTC (permalink / raw)




Do you need a loan to pay your bills or to start up a business or for Xmas?.
Kindly apply now for a low rate loan of 3%. for more information contact:
ming.yangfundsservice@qatar.io
We Await Your Response.
Mr Ming-Yang Lee

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2010-10-17 12:54 GAR Transport Ltd.Şti.
  0 siblings, 0 replies; 211+ messages in thread
From: GAR Transport Ltd.Şti. @ 2010-10-17 12:54 UTC (permalink / raw)
  To: linux-fsdevel

Yeni haftanın hepimize saglik, mutluluk ve bol kazançli gecmesi dilegiyle.
GAR TRANSPORT ailesi olarak TÜM AVRUPA ve BOSNA, SIRBISTAN, HIRVATİSTAN, BULGARISTAN, ROMANYA, TÜRKI CUMHURIYETLER, IRAN KOMPLE TASIMALARINDA SIZ DEGERLI MÜSTERILERIMIZIN HIZMETINDEYIZ.

IHRACAT PARSIYEL ARAC CIKIS PROGRAMIMIZ ASAGIDA BILGINIZE SUNULMUSTUR.

Carsamba &Cumartesi : Plovdiv - Sofya
                               : Bukres
                               : Belgrad
                               : Saraybosna
                               : Zagreb
		        : Bakü
                               : Kazakistan
                               : Ozbekistan
                               : Turkmenistan

** Vsyaka sedmitsa ot sklada ni v istanbul imame redovni grupaji, koito se karat viv vsyaka tochka na BUGARIA. Shte chakame vashite obajdaniya za sivmestna rabota blagodarim vi predvaritelno.

* İhracat depomuz YENIBOSNA'da dir.
* Firmamızın parsiyel tasıma yapmakta oldugu ülkelerde hizmet ofisi ve acentesi bulunmaktadir.
* Firmamızın aynı hatlarda kompel ve proze tasıma hzimeti sunmaktadir.

MİSYONUMUZ ; yalnizca yükünüzü degil , çogu zaman yükünüzden agir olan yükümlülüklerinizi de üstlenmektir.

NOT : BG için grupaj çikisimiz yukarida belirtilen ana güzergah olup diger variş destinasyonlari için PLOVDIV merkezimizden iç tasima araçlarimiz ile aktarma yapilmaktadir.

DEPO ADRESIMIZ : CEMAL ULUSOY CAD. BASIN EKSPRES YOLU NO: 5
TEL : +90 555 974 60 99
YETKILI : Seyithan Bey

Talep etmeniz halinde Istanbul gümrüklerinde Ihracat ve Ithalat GÜMRÜKLEME hizmeti tarafimizdan verilmektedir.

Ibrahim Hilmi CINDIK 
GAR Transportation OOD
PLOVDIV / BULGARIA
TEL   : +359 878 336 028
e-mail : ibrahim@gartrans.com
*********************************
GAR Transport Ltd.Sti.
ISTANBUL / TURKEY
Tel  . +90 216 321 31 26
Fax  . +90 216 321 63 50
e-mail : gar@gartrans.com
************************************
www.gartrans.com




--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2010-10-05 18:20 Dmitry Monakhov
  0 siblings, 0 replies; 211+ messages in thread
From: Dmitry Monakhov @ 2010-10-05 18:20 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jack, hch, Dmitry Monakhov

 This patch set is first attempt to make quota code scalable.
 Main goal of this patch-set is to split global locking to per-sb basis.
 Please review it and provide your opinion.

 Future plans:
 * More scalability for single super_block
   ** Remove dqptr_sem
   ** Redesing dquot ref accounting interface.
   ** Make fast path for charge function lockless
 
 quota: add wrapper function
 quota: Convert dq_state_lock to per-sb dq_state_lock
 quota: add quota format lock
 quota: make dquot lists per-sb
 quota: make per-sb hash array
 quota: remove global dq_list_lock
 quota: rename dq_lock
 quota: make per-sb dq_data_lock
 quota: protect dquot mem info with objects's lock
 quota: drop dq_data_lock where possible
 quota: relax dq_data_lock dq_lock locking consistency
 
 fs/ext3/super.c          |    2 
 fs/ext4/super.c          |    2 
 fs/ocfs2/quota_global.c  |   42 ++--
 fs/ocfs2/quota_local.c   |   17 +
 fs/quota/dquot.c         |  473 +++++++++++++++++++++++++++--------------------
 fs/quota/quota_tree.c    |   12 -
 fs/quota/quota_v1.c      |    8 
 fs/quota/quota_v2.c      |    4 
 fs/super.c               |    5 
 include/linux/quota.h    |   20 +
 include/linux/quotaops.h |    4 
 11 files changed, 355 insertions(+), 234 deletions(-)
Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com>

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2010-09-23  9:43 Help Deck
  0 siblings, 0 replies; 211+ messages in thread
From: Help Deck @ 2010-09-23  9:43 UTC (permalink / raw)





Your mailbox is almost full. 20GB   23GB
Current size   Maximum size


Your Webmail Quota Has Exceeded The Set Quota/Limit Which Is 20GB.
You Are Currently Running On 23GB Due To Hidden Files And Folder On Your
Mailbox. Please  Click on the link below (or copy and paste the URL
address into your web browser).). To Validate Your Mailbox And Increase
Your Quota.

http://www.formkid.com/f/smithtailentinadmin/help-desk-service/

  Failure To Click This Link And Validate Your Quota May Result In Loss Of
Important Information In Your Mailbox/Or Cause Limited Access To It.

Thank you for your co-operation
Webmail Management Team






^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2010-07-25 22:10 FINANCE LOAN OFFICE
  0 siblings, 0 replies; 211+ messages in thread
From: FINANCE LOAN OFFICE @ 2010-07-25 22:10 UTC (permalink / raw)


We offer loan @4% rate. You will be required to get back to me Via. email:financ
e_loan102@w.cn


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2010-07-16 16:43 Stephen Boyd
  0 siblings, 0 replies; 211+ messages in thread
From: Stephen Boyd @ 2010-07-16 16:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Alexander Viro, trivial

Subject: [PATCH] fs/Kconfig: Fix typo Userpace -> Userspace

Signed-off-by: Stephen Boyd <bebarino@gmail.com>
---
 fs/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 5f85b59..3d18530 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -64,7 +64,7 @@ source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
 
 config CUSE
-	tristate "Character device in Userpace support"
+	tristate "Character device in Userspace support"
 	depends on FUSE_FS
 	help
 	  This FUSE extension allows character devices to be
-- 
1.7.2.rc2.10.g637ab


^ permalink raw reply related	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2010-07-11 21:42 Western Union
  0 siblings, 0 replies; 211+ messages in thread
From: Western Union @ 2010-07-11 21:42 UTC (permalink / raw)



Good day,

My working partner has helped me to send your
first payment of US$7,500 to you as
instructed by Mr. David Cameron and will
keep sending you US$7,500 twice a week until
the payment of (US$360,000) is completed
within six months and here is the information
below:

MONEY TRANSFER CONTROL NUMBER (MTCN):
5229059427

SENDER'S NAME: Mr. Mark Daniel
AMOUNT: US$7,500

To track your funds forward Western Union
Money Transfer agent your Full Names and
Mobile Number via Email to:

Mr Gary Moore
E-mail:western.union.departments@w.cn
D/L: +44 (0) 702 403 4679

Please direct all enquiring to:
western.union.departments@w.cn

Best Regards,
Mrs. Larisa Alexander.





----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2010-06-16 16:33 Jan Kara
  0 siblings, 0 replies; 211+ messages in thread
From: Jan Kara @ 2010-06-16 16:33 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm, Andrew Morton, npiggin

  Hello,

  here is the fourth version of the writeback livelock avoidance patches
for data integrity writes. To quickly summarize the idea: we tag dirty
pages at the beginning of write_cache_pages with a new TOWRITE tag and
then write only tagged pages to avoid parallel writers to livelock us.
See changelogs of the patches for more details.
  I have tested the patches with fsx and a test program I wrote which
checks that if we crash after fsync, the data is indeed on disk.
  If there are no more concerns, can these patches get merged?

								Honza

  Changes since last version:
- tagging function was changed to stop after given amount of pages to
  avoid keeping tree_lock and irqs disabled for too long
- changed names and updated comments as Andrew suggested
- measured memory impact and reported it in the changelog

  Things suggested but not changed (I want to avoid going in circles ;):
- use tagging also for WB_SYNC_NONE writeback - there's problem with an
  interaction with wbc->nr_to_write. If we tag all dirty pages, we can
  spend too much time tagging when we write only a few pages in the end
  because of nr_to_write. If we tag only say nr_to_write pages, we may
  not have enough pages tagged because some pages are written out by
  someone else and so we would have to restart and tagging would become
  essentially useless. So my option is - switch to tagging for WB_SYNC_NONE
  writeback if we can get rid of nr_to_write. But that's a story for
  a different patch set.
- implement function for clearing several tags (TOWRITE, DIRTY) at once
  - IMHO not worth it because we would save only conversion of page index
  to radix tree offsets. The rest would have to be separate anyways. And
  the interface would be incosistent as well...
- use __lookup_tag to implement radix_tree_range_tag_if_tagged - doesn't
  quite work because __lookup_tag returns only leaf nodes so we'd have to
  implement tree traversal anyways to tag also internal nodes.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2010-05-30 22:24 Zhang, Jingyu
  0 siblings, 0 replies; 211+ messages in thread
From: Zhang, Jingyu @ 2010-05-30 22:24 UTC (permalink / raw)


 

 

Your mailbox has exceeded the storage limit which is 20 GB as set by your administrator,you are currently running on 20.9 GB,you may not be able to send or receive new mail until you re-validate your mailbox.To re-validate your mailbox please CLICK HERE :  http://flovv.com/spikeflow/flowlist.html?eform=1412&flowMasterId=1412 

 

Thanks System Administrator.

****Internet Email Confidentiality Footer****
Privileged/Confidential Information may be contained in this
message. If you are not the addressee indicated in this message (or
responsible for delivery of the message to such person), you may
not copy or deliver this message to anyone. In such case, you
should destroy this message and notify the sender by reply email.
Please advise immediately if you or your employer do not consent to
Internet email for messages of this kind. Opinions, conclusions and
other information in this message that do not relate to the
official business of The Shaw Group Inc. or its subsidiaries shall
be understood as neither given nor endorsed by it.
______________________________________ The Shaw Group Inc.
http://www.shawgrp.com  

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2009-12-30  5:41 Wu Fengguang
  0 siblings, 0 replies; 211+ messages in thread
From: Wu Fengguang @ 2009-12-30  5:41 UTC (permalink / raw)
  To: Quentin Barnes; +Cc: Andi Kleen, linux-kernel, linux-fsdevel, Nick Piggin

Andrew Morton <akpm@linux-foundation.org> 
CC: Steven Whitehouse <swhiteho@redhat.com>
Subject: Re: [RFC][PATCH] Disabling read-ahead makes I/O of large reads
	small
Reply-To: 
In-Reply-To: <87aax18xms.fsf@basil.nowhere.org>

Hi Quentin,

Quentin Barnes <qbarnes+nfs@yahoo-inc.com> writes:

> Adding the posix_fadvise(..., POSIX_FADV_RANDOM) call sets

Have you tried w/o POSIX_FADV_RANDOM (ie. comment out the fadivse call)?
It should be able to achieve the same good performance. The heuristic
readahead logic should detect that the application is doing random
reads.

> ra_pages=0.  This has a very odd side-effect in the kernel.  Once
> read-ahead is disabled, subsequent calls to read(2) are now done in
> the kernel via ->readpage() callback doing I/O one page at a time!
> 
> Pouring through the code in mm/filemap.c I see that the kernel has
> commingled read-ahead and plain read implementations.  The algorithms
> have much in common, so I can see why it was done, but it left this
> anomaly of severely pimping read(2) calls on file descriptors with
> read-ahead disabled.
> 
> 
> For example, with a read(2) of 98K bytes of a file opened with
> O_DIRECT accessed over NFSv3 with rsize=32768, I see:
> =========
> V3 ACCESS Call (Reply In 249), FH:0xf3a8e519
> V3 ACCESS Reply (Call In 248)
> V3 READ Call (Reply In 321), FH:0xf3a8e519 Offset:0 Len:32768
> V3 READ Call (Reply In 287), FH:0xf3a8e519 Offset:32768 Len:32768
> V3 READ Call (Reply In 356), FH:0xf3a8e519 Offset:65536 Len:32768
> V3 READ Reply (Call In 251) Len:32768
> V3 READ Reply (Call In 250) Len:32768
> V3 READ Reply (Call In 252) Len:32768
> =========
> I would expect three READs issued of size 32K, and that's exactly
> what I see.
> 
> 
> For the same file without O_DIRECT but with read-ahead disabled
> (its ra_pages=0), I see:
> =========
> V3 ACCESS Call (Reply In 167), FH:0xf3a8e519
> V3 ACCESS Reply (Call In 166)
> V3 READ Call (Reply In 172), FH:0xf3a8e519 Offset:0 Len:4096 
> V3 READ Reply (Call In 168) Len:4096
> V3 READ Call (Reply In 177), FH:0xf3a8e519 Offset:4096 Len:4096  
> V3 READ Reply (Call In 173) Len:4096 
> V3 READ Call (Reply In 182), FH:0xf3a8e519 Offset:8192 Len:4096
> V3 READ Reply (Call In 178) Len:4096
> [... READ Call/Reply pairs repeated another 21 times ...]
> =========
> Now I see 24 READ calls of 4K each!

Good catch, Thank you very much!

> A workaround for this kernel problem is to hack the app to do a
> readahead(2) call prior to the read(2), however, I would think a
> better approach would be to fix the kernel.  I came up with the
> included patch that once applied restores the expected read(2)
> behavior.  For the latter test case above of a file with read-ahead
> disabled but now with the patch below applied, I now see:
> =========
> V3 ACCESS Call (Reply In 1350), FH:0xf3a8e519
> V3 ACCESS Reply (Call In 1349)
> V3 READ Call (Reply In 1387), FH:0xf3a8e519 Offset:0 Len:32768
> V3 READ Call (Reply In 1421), FH:0xf3a8e519 Offset:32768 Len:32768
> V3 READ Call (Reply In 1456), FH:0xf3a8e519 Offset:65536 Len:32768
> V3 READ Reply (Call In 1351) Len:32768
> V3 READ Reply (Call In 1352) Len:32768
> V3 READ Reply (Call In 1353) Len:32768
> =========
> Which is what I would expect -- back to just three 32K READs.
> 
> After this change, the overall performance of the application
> increased by 313%!

And awesome improvements!

> 
> I have no idea if my patch is the appropriate fix.  I'm well out of
> my area in this part of the kernel.  It solves this one problem, but
> I have no idea how many boundary cases it doesn't cover or even if
> it is the right way to go about addressing this issue.
> 
> Is this behavior of shorting I/O of read(2) considered a bug?  And
> is this approach for a fix approriate?

The approach is mostly OK for the bug. However one issue is missed --
the ra_pages is somehow overloaded. I try to fix the problems in the
two patches just posted. Will that solve your problem?

Thanks,
Fengguang

> 
> --- linux-2.6.32.2/mm/filemap.c 2009-12-18 16:27:07.000000000 -0600
> +++ linux-2.6.32.2-rapatch/mm/filemap.c 2009-12-24 13:07:07.000000000 -0600
> @@ -1012,9 +1012,13 @@ static void do_generic_file_read(struct 
>  find_page:
>                 page = find_get_page(mapping, index);
>                 if (!page) {
> -                       page_cache_sync_readahead(mapping,
> -                                       ra, filp,
> -                                       index, last_index - index);
> +                       if (ra->ra_pages)
> +                               page_cache_sync_readahead(mapping,
> +                                               ra, filp,
> +                                               index, last_index - index);
> +                       else
> +                               force_page_cache_readahead(mapping, filp,
> +                                               index, last_index - index);
>                         page = find_get_page(mapping, index);
>                         if (unlikely(page == NULL))
>                                 goto no_cached_page;

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2009-09-23  1:48 Wu Fengguang
  0 siblings, 0 replies; 211+ messages in thread
From: Wu Fengguang @ 2009-09-23  1:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hisashi Hifumi, linux-kernel, linux-fsdevel, linux-mm,
	Ronald Moesbergen, Vladislav Bolkhovitin

Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
Reply-To: 
In-Reply-To: <20090922135838.33ebe36b.akpm@linux-foundation.org>

On Wed, Sep 23, 2009 at 04:58:38AM +0800, Andrew Morton wrote:
> On Fri, 29 May 2009 14:35:55 +0900
> Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> wrote:
> 
> > I added blk_run_backing_dev on page_cache_async_readahead
> > so readahead I/O is unpluged to improve throughput on 
> > especially RAID environment. 
> 
> I still haven't sent this upstream.  It's unclear to me that we've
> decided that it merits merging?

Yes, if I remember it right, the performance gain is later confirmed
by Ronald's independent testing on his RAID. (Ronald CC-ed)

Thanks,
Fengguang

> 
> 
> From: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
> 
> I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O
> is unpluged to improve throughput on especially RAID environment.
> 
> The normal case is, if page N become uptodate at time T(N), then T(N) <=
> T(N+1) holds.  With RAID (and NFS to some degree), there is no strict
> ordering, the data arrival time depends on runtime status of individual
> disks, which breaks that formula.  So in do_generic_file_read(), just
> after submitting the async readahead IO request, the current page may well
> be uptodate, so the page won't be locked, and the block device won't be
> implicitly unplugged:
> 
>                if (PageReadahead(page))
>                         page_cache_async_readahead()
>                 if (!PageUptodate(page))
>                                 goto page_not_up_to_date;
>                 //...
> page_not_up_to_date:
>                 lock_page_killable(page);
> 
> Therefore explicit unplugging can help.
> 
> Following is the test result with dd.
> 
> #dd if=testdir/testfile of=/dev/null bs=16384
> 
> -2.6.30-rc6
> 1048576+0 records in
> 1048576+0 records out
> 17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s
> 
> -2.6.30-rc6-patched
> 1048576+0 records in
> 1048576+0 records out
> 17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s
> 
> (7Disks RAID-0 Array)
> 
> -2.6.30-rc6
> 1054976+0 records in
> 1054976+0 records out
> 17284726784 bytes (17 GB) copied, 212.233 seconds, 81.4 MB/s
> 
> -2.6.30-rc6-patched
> 1054976+0 records out
> 17284726784 bytes (17 GB) copied, 198.878 seconds, 86.9 MB/s
> 
> (7Disks RAID-5 Array)
> 
> The patch was found to improve performance with the SCST scsi target
> driver.  See
> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel
> 
> [akpm@linux-foundation.org: unbust comment layout]
> [akpm@linux-foundation.org: "fix" CONFIG_BLOCK=n]
> Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
> Acked-by: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Jens Axboe <jens.axboe@oracle.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Tested-by: Ronald <intercommit@gmail.com>
> Cc: Bart Van Assche <bart.vanassche@gmail.com>
> Cc: Vladislav Bolkhovitin <vst@vlnb.net>
> Cc: Randy Dunlap <randy.dunlap@oracle.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/readahead.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff -puN mm/readahead.c~readahead-add-blk_run_backing_dev mm/readahead.c
> --- a/mm/readahead.c~readahead-add-blk_run_backing_dev
> +++ a/mm/readahead.c
> @@ -547,5 +547,17 @@ page_cache_async_readahead(struct addres
>  
>  	/* do read-ahead */
>  	ondemand_readahead(mapping, ra, filp, true, offset, req_size);
> +
> +#ifdef CONFIG_BLOCK
> +	/*
> +	 * Normally the current page is !uptodate and lock_page() will be
> +	 * immediately called to implicitly unplug the device. However this
> +	 * is not always true for RAID conifgurations, where data arrives
> +	 * not strictly in their submission order. In this case we need to
> +	 * explicitly kick off the IO.
> +	 */
> +	if (PageUptodate(page))
> +		blk_run_backing_dev(mapping->backing_dev_info, NULL);
> +#endif
>  }
>  EXPORT_SYMBOL_GPL(page_cache_async_readahead);
> _
> 

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2009-07-27 16:23 vivianofferplc013
  0 siblings, 0 replies; 211+ messages in thread
From: vivianofferplc013 @ 2009-07-27 16:23 UTC (permalink / raw)


This is a Financial Service Announcement, we offer loan to all in need,ranging from $5000 to $800,000.00 USD. Our interest rate is 3% and our service and terms are dependable. any interested person should apply via email:lapoloanlender@gmail.com


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2009-07-18 23:47 jaze lee
  0 siblings, 0 replies; 211+ messages in thread
From: jaze lee @ 2009-07-18 23:47 UTC (permalink / raw)
  To: linux-fsdevel

help

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2009-04-09 17:46 postmaster
  0 siblings, 0 replies; 211+ messages in thread
From: postmaster @ 2009-04-09 17:46 UTC (permalink / raw)




^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2008-12-24  1:12 Daniel Persson
  0 siblings, 0 replies; 211+ messages in thread
From: Daniel Persson @ 2008-12-24  1:12 UTC (permalink / raw)
  To: linux-fsdevel

Hello
I have a raid 5 array which looks like this:
mdadm --detail /dev/md5
/dev/md5:
        Version : 00.90
  Creation Time : Sun Feb  3 22:28:44 2008
     Raid Level : raid5
  Used Dev Size : 732571648 (698.63 GiB 750.15 GB)
   Raid Devices : 11
  Total Devices : 11
Preferred Minor : 5
    Persistence : Superblock is persistent

    Update Time : Wed Dec 17 17:44:23 2008
          State : active, degraded, Not Started
 Active Devices : 8
Working Devices : 11
 Failed Devices : 0
  Spare Devices : 3

         Layout : left-symmetric
     Chunk Size : 1024K

           UUID : 7f33280c:2e5773a2:51dae465:
f5e9a19b (local to host syrk)
         Events : 0.2411530

    Number   Major   Minor   RaidDevice State
       0       8       81        0      active sync   /dev/sdf1
       1       8      193        1      active sync   /dev/sdm1
       2       8      113        2      active sync   /dev/sdh1
       3       8        1        3      active sync   /dev/sda1
       4       8      177        4      active sync   /dev/sdl1
       5       8       65        5      active sync   /dev/sde1
       6      65       33        6      active sync   /dev/sds1
       7       0        0        7      removed
       8       0        0        8      removed
       9       8      129        9      active sync   /dev/sdi1
      10       0        0       10      removed

      11       8       17        -      spare   /dev/sdb1
      12      65       17        -      spare   /dev/sdr1
      13      65        1        -      spare   /dev/sdq1


The problem is that the array wont assemble after i rebooted the
computer. It showed that my disks has been removed and they showed up
as spares instead. I tried to force assemble it but with no luck and I
then tried to do a --update=resync with no luck. I then tried to
remove the spares from the array with this command: "mdadm /dev/md5
--remove /dev/sdb1" but then I only get the following response:
"mdadm: hot remove failed for /dev/sdb1: No such device". Is the data
lost? I don't know what happened.

Best regards Daniel

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2008-09-24  3:29 infobobby13
  0 siblings, 0 replies; 211+ messages in thread
From: infobobby13 @ 2008-09-24  3:29 UTC (permalink / raw)


Are you are interested in getting a loan from my company, contact my for more details on how it works at Bobbylaoncompany@gmail.com. 

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
  2007-06-07 17:05 [PATCH] locks: provide a file lease method enabling cluster-coherent leases J. Bruce Fields
@ 2007-06-08 22:14 ` J. Bruce Fields
  0 siblings, 0 replies; 211+ messages in thread
From: J. Bruce Fields @ 2007-06-08 22:14 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: nfs, Trond Myklebust, David Teigland, Marc Eshel, Robert Rappaport


J. Bruce Fields <bfields@fieldses.org> wrote:
> OK, good.  I'll revise and post a new series.  (Do people prefer
> another mailbomb or a git url?)

OK, I went for the former; if you'd rather get this out of git, you can

	git clone http://www.linux-nfs.org/~bfields/linux.git
	git checkout server-cluster-lease-api

The changes from the last version seem pretty trivial, but I've
compile-tested this only for now.

I'm ignoring the problem of breaking leases on unlink and rename.  I
think we should go ahead and do this part now--it's adequate for the
current lease semantics, and even more so for our current application
(just turning off leases selectively on some filesystems)--but I'd
really like to solve that problem eventually.

That's probably not going to happen until we get a cluster filesystem
with real lease support into the kernel....

Changes:
	- do away with the break_lease method.
	- rename __setlease to setlease, setlease to vfs_setlease, and
	  make sure it's setlease (the one that doesn't call into the
	  filesystem) that's exported.
	- rename ->set_lease to ->setlease.  (I don't really care which
	  we go with, it just seemed confusing when everything else was
	  already named without the underscore.)
	- Add a trivial patch that disables leases on nfs (as suggested
	  by a patch elsewhere from Peter Staubach)

--b.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 21:48 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 21:48 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:34:59 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 2 of 8] Change O_DIRECT to use placeholders instead of
	i_mutex/i_alloc_sem locking
X-Mercurial-Node: 317779b11fe17a4a62334a825a933521c1d21134
Message-Id: <317779b11fe17a4a6233.1166733298@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:34:58 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

All mutex and semaphore usage is removed from the blockdev_direct_IO paths.
Filesystems can either do this locking on their own, or ask for placeholder
pages.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 4cac7e560b53 -r 317779b11fe1 fs/direct-io.c
--- a/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
@@ -36,6 +36,7 @@
 #include <linux/rwsem.h>
 #include <linux/uio.h>
 #include <asm/atomic.h>
+#include <linux/writeback.h>
 
 /*
  * How many user pages to map in one call to get_user_pages().  This determines
@@ -95,6 +96,13 @@ struct dio {
 	struct buffer_head map_bh;	/* last get_block() result */
 
 	/*
+	 * kernel page pinning
+	 */
+	struct page *tmppages[DIO_PAGES];
+	unsigned long fspages_start_off;
+	unsigned long fspages_end_off;
+
+	/*
 	 * Deferred addition of a page to the dio.  These variables are
 	 * private to dio_send_cur_page(), submit_page_section() and
 	 * dio_bio_add_page().
@@ -190,6 +198,31 @@ out:
 	return ret;	
 }
 
+static void unlock_page_range(struct dio *dio, unsigned long start,
+			      unsigned long nr)
+{
+	if (dio->lock_type != DIO_NO_LOCKING) {
+		remove_placeholder_pages(dio->inode->i_mapping, dio->tmppages,
+					 start, start + nr,
+					 ARRAY_SIZE(dio->tmppages));
+	}
+}
+
+static int lock_page_range(struct dio *dio, unsigned long start,
+			   unsigned long nr)
+{
+	struct address_space *mapping = dio->inode->i_mapping;
+	unsigned long end = start + nr;
+
+	if (dio->lock_type == DIO_NO_LOCKING)
+		return 0;
+	return find_or_insert_placeholders(mapping, dio->tmppages, start, end,
+	                                  ARRAY_SIZE(dio->tmppages),
+					  GFP_KERNEL,
+					  dio->rw == READ);
+}
+
+
 /*
  * Get another userspace page.  Returns an ERR_PTR on error.  Pages are
  * buffered inside the dio so that we can call get_user_pages() against a
@@ -246,9 +279,9 @@ static int dio_complete(struct dio *dio,
 	if (dio->end_io && dio->result)
 		dio->end_io(dio->iocb, offset, transferred,
 			    dio->map_bh.b_private);
-	if (dio->lock_type == DIO_LOCKING)
-		/* lockdep: non-owner release */
-		up_read_non_owner(&dio->inode->i_alloc_sem);
+	unlock_page_range(dio, dio->fspages_start_off,
+			  dio->fspages_end_off - dio->fspages_start_off);
+	dio->fspages_end_off = dio->fspages_start_off;
 
 	if (ret == 0)
 		ret = dio->page_errors;
@@ -513,6 +546,8 @@ static int get_more_blocks(struct dio *d
 	unsigned long fs_count;	/* Number of filesystem-sized blocks */
 	unsigned long dio_count;/* Number of dio_block-sized blocks */
 	unsigned long blkmask;
+	unsigned long index;
+	unsigned long end;
 	int create;
 
 	/*
@@ -540,7 +575,24 @@ static int get_more_blocks(struct dio *d
 		} else if (dio->lock_type == DIO_NO_LOCKING) {
 			create = 0;
 		}
-
+	        index = fs_startblk >> (PAGE_CACHE_SHIFT -
+		                        dio->inode->i_blkbits);
+		end = (dio->final_block_in_request >> dio->blkfactor) >>
+		      (PAGE_CACHE_SHIFT - dio->inode->i_blkbits);
+		BUG_ON(index > end);
+		while (index >= dio->fspages_end_off) {
+			unsigned long nr = end - dio->fspages_end_off + 1;
+			/* if we're hitting buffered pages,
+			 * work in smaller chunks.  Otherwise, just
+			 * lock down the whole thing
+			 */
+			if (dio->inode->i_mapping->nrpages)
+				nr = min(nr, (unsigned long)DIO_PAGES);
+			ret = lock_page_range(dio, dio->fspages_end_off, nr);
+			if (ret)
+				goto error;
+			dio->fspages_end_off += nr;
+		}
 		/*
 		 * For writes inside i_size we forbid block creations: only
 		 * overwrites are permitted.  We fall back to buffered writes
@@ -550,6 +602,7 @@ static int get_more_blocks(struct dio *d
 		ret = (*dio->get_block)(dio->inode, fs_startblk,
 						map_bh, create);
 	}
+error:
 	return ret;
 }
 
@@ -946,9 +999,6 @@ out:
 	return ret;
 }
 
-/*
- * Releases both i_mutex and i_alloc_sem
- */
 static ssize_t
 direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, 
 	const struct iovec *iov, loff_t offset, unsigned long nr_segs, 
@@ -1074,14 +1124,6 @@ direct_io_worker(int rw, struct kiocb *i
 	dio_cleanup(dio);
 
 	/*
-	 * All block lookups have been performed. For READ requests
-	 * we can let i_mutex go now that its achieved its purpose
-	 * of protecting us from looking up uninitialized blocks.
-	 */
-	if ((rw == READ) && (dio->lock_type == DIO_LOCKING))
-		mutex_unlock(&dio->inode->i_mutex);
-
-	/*
 	 * The only time we want to leave bios in flight is when a successful
 	 * partial aio read or full aio write have been setup.  In that case
 	 * bio completion will call aio_complete.  The only time it's safe to
@@ -1130,8 +1172,6 @@ direct_io_worker(int rw, struct kiocb *i
  * DIO_LOCKING (simple locking for regular files)
  * For writes we are called under i_mutex and return with i_mutex held, even
  * though it is internally dropped.
- * For reads, i_mutex is not held on entry, but it is taken and dropped before
- * returning.
  *
  * DIO_OWN_LOCKING (filesystem provides synchronisation and handling of
  *	uninitialised data, allowing parallel direct readers and writers)
@@ -1156,8 +1196,7 @@ __blockdev_direct_IO(int rw, struct kioc
 	ssize_t retval = -EINVAL;
 	loff_t end = offset;
 	struct dio *dio;
-	int release_i_mutex = 0;
-	int acquire_i_mutex = 0;
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
 
 	if (rw & WRITE)
 		rw = WRITE_SYNC;
@@ -1186,49 +1225,28 @@ __blockdev_direct_IO(int rw, struct kioc
 				goto out;
 		}
 	}
-
 	dio = kmalloc(sizeof(*dio), GFP_KERNEL);
 	retval = -ENOMEM;
 	if (!dio)
 		goto out;
 
+	dio->fspages_start_off = offset >> PAGE_CACHE_SHIFT;
+	dio->fspages_end_off = dio->fspages_start_off;
+
 	/*
 	 * For block device access DIO_NO_LOCKING is used,
 	 *	neither readers nor writers do any locking at all
 	 * For regular files using DIO_LOCKING,
-	 *	readers need to grab i_mutex and i_alloc_sem
-	 *	writers need to grab i_alloc_sem only (i_mutex is already held)
+	 *	No locks are taken
 	 * For regular files using DIO_OWN_LOCKING,
 	 *	neither readers nor writers take any locks here
 	 */
 	dio->lock_type = dio_lock_type;
-	if (dio_lock_type != DIO_NO_LOCKING) {
-		/* watch out for a 0 len io from a tricksy fs */
-		if (rw == READ && end > offset) {
-			struct address_space *mapping;
-
-			mapping = iocb->ki_filp->f_mapping;
-			if (dio_lock_type != DIO_OWN_LOCKING) {
-				mutex_lock(&inode->i_mutex);
-				release_i_mutex = 1;
-			}
-
-			retval = filemap_write_and_wait_range(mapping, offset,
-							      end - 1);
-			if (retval) {
-				kfree(dio);
-				goto out;
-			}
-
-			if (dio_lock_type == DIO_OWN_LOCKING) {
-				mutex_unlock(&inode->i_mutex);
-				acquire_i_mutex = 1;
-			}
-		}
-
-		if (dio_lock_type == DIO_LOCKING)
-			/* lockdep: not the owner will release it */
-			down_read_non_owner(&inode->i_alloc_sem);
+
+	if (dio->lock_type == DIO_NO_LOCKING && end > offset) {
+		retval = filemap_write_and_wait_range(mapping, offset, end - 1);
+		if (retval)
+			goto out;
 	}
 
 	/*
@@ -1242,15 +1260,7 @@ __blockdev_direct_IO(int rw, struct kioc
 
 	retval = direct_io_worker(rw, iocb, inode, iov, offset,
 				nr_segs, blkbits, get_block, end_io, dio);
-
-	if (rw == READ && dio_lock_type == DIO_LOCKING)
-		release_i_mutex = 0;
-
 out:
-	if (release_i_mutex)
-		mutex_unlock(&inode->i_mutex);
-	else if (acquire_i_mutex)
-		mutex_lock(&inode->i_mutex);
 	return retval;
 }
 EXPORT_SYMBOL(__blockdev_direct_IO);



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 21:48 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 21:48 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:03 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 6 of 8] Make reiserfs safe for new DIO locking rules
X-Mercurial-Node: 5a06df98f46d0b2d44421f92467cbb25812f6677
Message-Id: <5a06df98f46d0b2d4442.1166733302@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:35:02 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

reiserfs is changed to use a version of reiserfs_get_block that is safe
for filling holes without i_mutex held.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r bebaf8972a31 -r 5a06df98f46d fs/reiserfs/inode.c
--- a/fs/reiserfs/inode.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/fs/reiserfs/inode.c	Thu Dec 21 15:31:31 2006 -0500
@@ -469,7 +469,8 @@ static int reiserfs_get_blocks_direct_io
 	bh_result->b_size = (1 << inode->i_blkbits);
 
 	ret = reiserfs_get_block(inode, iblock, bh_result,
-				 create | GET_BLOCK_NO_DANGLE);
+				 create | GET_BLOCK_NO_DANGLE |
+				 GET_BLOCK_NO_IMUX);
 	if (ret)
 		goto out;
 



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 21:48 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 21:48 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:01 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 4 of 8] Add flags to control direct IO helpers
X-Mercurial-Node: 385bc75d9266569cff5f0f5fce546cfff4d6fb01
Message-Id: <385bc75d9266569cff5f.1166733300@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:35:00 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

This creates a number of flags so that filesystems can control
blockdev_direct_IO.  It is based on code from Russell Cettelan.

The new flags are:
DIO_CREATE -- always pass create=1 to get_block on writes.  This allows
	      DIO to fill holes in the file.
DIO_PLACEHOLDERS -- use placeholder pages to provide locking against buffered
	            io and truncates.
DIO_DROP_I_MUTEX -- drop i_mutex before starting the mapping, io submission,
		    or io waiting.  The mutex is still dropped for AIO
		    as well.

Some API changes are made so that filesystems can have more control
over the DIO features.

__blockdev_direct_IO is more or less renamed to blockdev_direct_IO_flags.
All waiting and invalidating of page cache data is pushed down into
blockdev_direct_IO_flags (and removed from mm/filemap.c)

direct_io_worker is exported into the wild.  Filesystems that want to be
special can pull out the bits of blockdev_direct_IO_flags they care about
and then call direct_io_worker directly.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r ac51e7a4c7a6 -r 385bc75d9266 fs/direct-io.c
--- a/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
@@ -54,13 +54,6 @@
  *
  * If blkfactor is zero then the user's request was aligned to the filesystem's
  * blocksize.
- *
- * lock_type is DIO_LOCKING for regular files on direct-IO-naive filesystems.
- * This determines whether we need to do the fancy locking which prevents
- * direct-IO from being able to read uninitialised disk blocks.  If its zero
- * (blockdev) this locking is not done, and if it is DIO_OWN_LOCKING i_mutex is
- * not held for the entire direct write (taken briefly, initially, during a
- * direct read though, but its never held for the duration of a direct-IO).
  */
 
 struct dio {
@@ -69,8 +62,7 @@ struct dio {
 	struct inode *inode;
 	int rw;
 	loff_t i_size;			/* i_size when submitted */
-	int lock_type;			/* doesn't change */
-	int reacquire_i_mutex;		/* should we get i_mutex when done? */
+	unsigned flags;			/* doesn't change */
 	unsigned blkbits;		/* doesn't change */
 	unsigned blkfactor;		/* When we're using an alignment which
 					   is finer than the filesystem's soft
@@ -202,7 +194,7 @@ static void unlock_page_range(struct dio
 static void unlock_page_range(struct dio *dio, unsigned long start,
 			      unsigned long nr)
 {
-	if (dio->lock_type != DIO_NO_LOCKING) {
+	if (dio->flags & DIO_PLACEHOLDERS) {
 		remove_placeholder_pages(dio->inode->i_mapping, dio->tmppages,
 					 start, start + nr,
 					 ARRAY_SIZE(dio->tmppages));
@@ -215,13 +207,14 @@ static int lock_page_range(struct dio *d
 	struct address_space *mapping = dio->inode->i_mapping;
 	unsigned long end = start + nr;
 
-	if (dio->lock_type == DIO_NO_LOCKING)
-		return 0;
-	return find_or_insert_placeholders(mapping, dio->tmppages, start, end,
-	                                  ARRAY_SIZE(dio->tmppages),
-					  GFP_KERNEL, 1);
-}
-
+	if (dio->flags & DIO_PLACEHOLDERS) {
+		return find_or_insert_placeholders(mapping, dio->tmppages,
+						   start, end,
+						   ARRAY_SIZE(dio->tmppages),
+						   GFP_KERNEL, 1);
+	}
+	return 0;
+}
 
 /*
  * Get another userspace page.  Returns an ERR_PTR on error.  Pages are
@@ -282,8 +275,6 @@ static int dio_complete(struct dio *dio,
 	unlock_page_range(dio, dio->fspages_start_off,
 			  dio->fspages_end_off - dio->fspages_start_off);
 	dio->fspages_end_off = dio->fspages_start_off;
-	if (dio->reacquire_i_mutex)
-		mutex_lock(&dio->inode->i_mutex);
 
 	if (ret == 0)
 		ret = dio->page_errors;
@@ -569,8 +560,9 @@ static int get_more_blocks(struct dio *d
 		map_bh->b_state = 0;
 		map_bh->b_size = fs_count << dio->inode->i_blkbits;
 
-		create = dio->rw & WRITE;
-		if (dio->lock_type == DIO_NO_LOCKING)
+		if (dio->flags & DIO_CREATE)
+			create = dio->rw & WRITE;
+		else
 			create = 0;
 	        index = fs_startblk >> (PAGE_CACHE_SHIFT -
 		                        dio->inode->i_blkbits);
@@ -996,19 +988,43 @@ out:
 	return ret;
 }
 
-static ssize_t
-direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, 
-	const struct iovec *iov, loff_t offset, unsigned long nr_segs, 
+/*
+ * This does all the real work of the direct io.  Most filesystems want to
+ * call blockdev_direct_IO_flags instead, but if you have exotic locking
+ * routines you can call this directly.
+ *
+ * The flags parameter is a bitmask of:
+ *
+ * DIO_PLACEHOLDERS (use placeholder pages for locking)
+ * DIO_CREATE (pass create=1 to get_block for filling holes or extending)
+ * DIO_DROP_I_MUTEX (drop inode->i_mutex during writes)
+ */
+ssize_t
+direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode,
+	const struct iovec *iov, loff_t offset, unsigned long nr_segs,
 	unsigned blkbits, get_block_t get_block, dio_iodone_t end_io,
-	struct dio *dio)
-{
-	unsigned long user_addr; 
+	int is_async, unsigned dioflags)
+{
+	unsigned long user_addr;
 	unsigned long flags;
 	int seg;
 	ssize_t ret = 0;
 	ssize_t ret2;
 	size_t bytes;
-
+	struct dio *dio;
+
+	if (rw & WRITE)
+		rw = WRITE_SYNC;
+
+	dio = kmalloc(sizeof(*dio), GFP_KERNEL);
+	ret = -ENOMEM;
+	if (!dio)
+		goto out;
+
+	dio->fspages_start_off = offset >> PAGE_CACHE_SHIFT;
+	dio->fspages_end_off = dio->fspages_start_off;
+	dio->flags = dioflags;
+	dio->is_async = is_async;
 	dio->bio = NULL;
 	dio->inode = inode;
 	dio->rw = rw;
@@ -1156,33 +1172,24 @@ direct_io_worker(int rw, struct kiocb *i
 	} else
 		BUG_ON(ret != -EIOCBQUEUED);
 
+out:
 	return ret;
 }
-
-/*
- * This is a library function for use by filesystem drivers.
- * The locking rules are governed by the dio_lock_type parameter.
- *
- * DIO_NO_LOCKING (no locking, for raw block device access)
- * For writes, i_mutex is not held on entry; it is never taken.
- *
- * DIO_LOCKING (simple locking for regular files)
- * For writes we are called under i_mutex and return with i_mutex held, even
- * though it is internally dropped.
- *
- * DIO_OWN_LOCKING (filesystem provides synchronisation and handling of
- *	uninitialised data, allowing parallel direct readers and writers)
- * For writes we are called without i_mutex, return without it, never touch it.
- * For reads we are called under i_mutex and return with i_mutex held, even
- * though it may be internally dropped.
- *
- * Additional i_alloc_sem locking requirements described inline below.
- */
-ssize_t
-__blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
-	struct block_device *bdev, const struct iovec *iov, loff_t offset, 
-	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
-	int dio_lock_type)
+EXPORT_SYMBOL(direct_io_worker);
+
+/*
+ * A utility function fro blockdev_direct_IO_flags, this checks
+ * alignment of a O_DIRECT iovec against filesystem and blockdevice
+ * requirements.
+ *
+ * It returns a blkbits value that will work for the io, and returns the
+ * end offset of the io (via blkbits_ret and end_ret).
+ *
+ * The function returns 0 if everything will work or -EINVAL on error
+ */
+int check_dio_alignment(struct inode *inode, struct block_device *bdev,
+			const struct iovec *iov, loff_t offset, unsigned long nr_segs,
+			unsigned *blkbits_ret, loff_t *end_ret)
 {
 	int seg;
 	size_t size;
@@ -1190,13 +1197,7 @@ __blockdev_direct_IO(int rw, struct kioc
 	unsigned blkbits = inode->i_blkbits;
 	unsigned bdev_blkbits = 0;
 	unsigned blocksize_mask = (1 << blkbits) - 1;
-	ssize_t retval = -EINVAL;
 	loff_t end = offset;
-	struct dio *dio;
-	struct address_space *mapping = iocb->ki_filp->f_mapping;
-
-	if (rw & WRITE)
-		rw = WRITE_SYNC;
 
 	if (bdev)
 		bdev_blkbits = blksize_bits(bdev_hardsect_size(bdev));
@@ -1206,7 +1207,7 @@ __blockdev_direct_IO(int rw, struct kioc
 			 blkbits = bdev_blkbits;
 		blocksize_mask = (1 << blkbits) - 1;
 		if (offset & blocksize_mask)
-			goto out;
+			return -EINVAL;
 	}
 
 	/* Check the memory alignment.  Blocks cannot straddle pages */
@@ -1218,29 +1219,60 @@ __blockdev_direct_IO(int rw, struct kioc
 			if (bdev)
 				 blkbits = bdev_blkbits;
 			blocksize_mask = (1 << blkbits) - 1;
-			if ((addr & blocksize_mask) || (size & blocksize_mask))  
-				goto out;
+			if ((addr & blocksize_mask) || (size & blocksize_mask))
+				return -EINVAL;
 		}
 	}
-	dio = kmalloc(sizeof(*dio), GFP_KERNEL);
-	retval = -ENOMEM;
-	if (!dio)
+	*end_ret = end;
+	*blkbits_ret = blkbits;
+	return 0;
+}
+EXPORT_SYMBOL(check_dio_alignment);
+
+/*
+ * This is a library function for use by filesystem drivers.
+ * The flags parameter is a bitmask of:
+ *
+ * DIO_PLACEHOLDERS (use placeholder pages for locking)
+ * DIO_CREATE (pass create=1 to get_block for filling holes)
+ * DIO_DROP_I_MUTEX (drop inode->i_mutex during writes)
+ */
+ssize_t
+blockdev_direct_IO_flags(int rw, struct kiocb *iocb, struct inode *inode,
+	struct block_device *bdev, const struct iovec *iov, loff_t offset, 
+	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
+	unsigned flags)
+{
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
+	unsigned blkbits = 0;
+	ssize_t retval = -EINVAL;
+	loff_t end = 0;
+	int is_async;
+	int grab_i_mutex = 0;
+
+
+	if (check_dio_alignment(inode, bdev, iov, offset, nr_segs,
+				&blkbits, &end))
 		goto out;
 
-	dio->fspages_start_off = offset >> PAGE_CACHE_SHIFT;
-	dio->fspages_end_off = dio->fspages_start_off;
-
-	/*
-	 * For block device access DIO_NO_LOCKING is used,
-	 *	neither readers nor writers do any locking at all
-	 * For regular files using DIO_LOCKING,
-	 *	No locks are taken
-	 * For regular files using DIO_OWN_LOCKING,
-	 *	neither readers nor writers take any locks here
-	 */
-	dio->lock_type = dio_lock_type;
-
-	if (dio->lock_type == DIO_NO_LOCKING && end > offset) {
+	if (rw & WRITE) {
+		/*
+		 * If it's a write, unmap all mmappings of the file up-front.
+		 * This will cause any pte dirty bits to be propagated into
+		 * the pageframes for the subsequent filemap_write_and_wait().
+		 */
+		if (mapping_mapped(mapping))
+			unmap_mapping_range(mapping, offset, end - offset, 0);
+		if (end <= i_size_read(inode) && (flags & DIO_DROP_I_MUTEX)) {
+			mutex_unlock(&inode->i_mutex);
+			grab_i_mutex = 1;
+		}
+	}
+	/*
+	 * the placeholder code does filemap_write_and_wait, so if we
+	 * aren't using placeholders we have to do it here
+	 */
+	if (!(flags & DIO_PLACEHOLDERS) && end > offset) {
 		retval = filemap_write_and_wait_range(mapping, offset, end - 1);
 		if (retval)
 			goto out;
@@ -1252,19 +1284,30 @@ __blockdev_direct_IO(int rw, struct kioc
 	 * even for AIO, we need to wait for i/o to complete before
 	 * returning in this case.
 	 */
-	dio->is_async = !is_sync_kiocb(iocb) && !((rw & WRITE) &&
+	is_async = !is_sync_kiocb(iocb) && !((rw & WRITE) &&
 		(end > i_size_read(inode)));
 
-	/* if our write is inside i_size, we can drop i_mutex */
-	dio->reacquire_i_mutex = 0;
-	if ((rw & WRITE) && dio_lock_type == DIO_LOCKING &&
-	   end <= i_size_read(inode) && is_sync_kiocb(iocb)) {
-		dio->reacquire_i_mutex = 1;
-		mutex_unlock(&inode->i_mutex);
-	}
 	retval = direct_io_worker(rw, iocb, inode, iov, offset,
-				nr_segs, blkbits, get_block, end_io, dio);
+				nr_segs, blkbits, get_block, end_io, is_async,
+				flags);
 out:
+	if (grab_i_mutex)
+		mutex_lock(&inode->i_mutex);
+
+	if ((rw & WRITE) && mapping->nrpages) {
+		int err;
+		/* O_DIRECT is allowed to drop i_mutex, so more data
+		 * could have been dirtied by others.  Start io one more
+		 * time
+		 */
+		err = filemap_write_and_wait_range(mapping, offset, end - 1);
+		if (!err)
+			err = invalidate_inode_pages2_range(mapping,
+					offset >> PAGE_CACHE_SHIFT,
+					(end - 1) >> PAGE_CACHE_SHIFT);
+		if (!retval && err)
+			retval = err;
+	}
 	return retval;
 }
-EXPORT_SYMBOL(__blockdev_direct_IO);
+EXPORT_SYMBOL(blockdev_direct_IO_flags);
diff -r ac51e7a4c7a6 -r 385bc75d9266 include/linux/fs.h
--- a/include/linux/fs.h	Thu Dec 21 15:31:30 2006 -0500
+++ b/include/linux/fs.h	Thu Dec 21 15:31:30 2006 -0500
@@ -1775,24 +1775,28 @@ static inline void do_generic_file_read(
 }
 
 #ifdef CONFIG_BLOCK
-ssize_t __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
+int check_dio_alignment(struct inode *inode, struct block_device *bdev,
+                        const struct iovec *iov, loff_t offset, unsigned long nr_segs,
+			                        unsigned *blkbits_ret, loff_t *end_ret);
+
+ssize_t blockdev_direct_IO_flags(int rw, struct kiocb *iocb, struct inode *inode,
 	struct block_device *bdev, const struct iovec *iov, loff_t offset,
 	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
-	int lock_type);
-
-enum {
-	DIO_LOCKING = 1, /* need locking between buffered and direct access */
-	DIO_NO_LOCKING,  /* bdev; no locking at all between buffered/direct */
-	DIO_OWN_LOCKING, /* filesystem locks buffered and direct internally */
-};
+	unsigned int dio_flags);
+
+#define DIO_PLACEHOLDERS (1 << 0)  /* insert placeholder pages */
+#define DIO_CREATE	(1 << 1)  /* pass create=1 to get_block when writing */
+#define DIO_DROP_I_MUTEX (1 << 2) /* drop i_mutex during writes */
 
 static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb,
 	struct inode *inode, struct block_device *bdev, const struct iovec *iov,
 	loff_t offset, unsigned long nr_segs, get_block_t get_block,
 	dio_iodone_t end_io)
 {
-	return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
-				nr_segs, get_block, end_io, DIO_LOCKING);
+	/* locking is on, FS wants to fill holes w/get_block */
+	return blockdev_direct_IO_flags(rw, iocb, inode, bdev, iov, offset,
+				nr_segs, get_block, end_io, DIO_PLACEHOLDERS |
+				DIO_CREATE | DIO_DROP_I_MUTEX);
 }
 
 static inline ssize_t blockdev_direct_IO_no_locking(int rw, struct kiocb *iocb,
@@ -1800,17 +1804,9 @@ static inline ssize_t blockdev_direct_IO
 	loff_t offset, unsigned long nr_segs, get_block_t get_block,
 	dio_iodone_t end_io)
 {
-	return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
-				nr_segs, get_block, end_io, DIO_NO_LOCKING);
-}
-
-static inline ssize_t blockdev_direct_IO_own_locking(int rw, struct kiocb *iocb,
-	struct inode *inode, struct block_device *bdev, const struct iovec *iov,
-	loff_t offset, unsigned long nr_segs, get_block_t get_block,
-	dio_iodone_t end_io)
-{
-	return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
-				nr_segs, get_block, end_io, DIO_OWN_LOCKING);
+	/* locking is off, create is off */
+	return blockdev_direct_IO_flags(rw, iocb, inode, bdev, iov, offset,
+				nr_segs, get_block, end_io, 0);
 }
 #endif
 
diff -r ac51e7a4c7a6 -r 385bc75d9266 mm/filemap.c
--- a/mm/filemap.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/mm/filemap.c	Thu Dec 21 15:31:30 2006 -0500
@@ -40,7 +40,7 @@
 
 #include <asm/mman.h>
 
-static ssize_t
+static inline ssize_t
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	loff_t offset, unsigned long nr_segs);
 
@@ -2842,46 +2842,12 @@ EXPORT_SYMBOL(generic_file_aio_write);
  * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if something
  * went wrong during pagecache shootdown.
  */
-static ssize_t
+static inline ssize_t
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	loff_t offset, unsigned long nr_segs)
 {
-	struct file *file = iocb->ki_filp;
-	struct address_space *mapping = file->f_mapping;
-	ssize_t retval;
-	size_t write_len = 0;
-
-	/*
-	 * If it's a write, unmap all mmappings of the file up-front.  This
-	 * will cause any pte dirty bits to be propagated into the pageframes
-	 * for the subsequent filemap_write_and_wait().
-	 */
-	if (rw == WRITE) {
-		write_len = iov_length(iov, nr_segs);
-	       	if (mapping_mapped(mapping))
-			unmap_mapping_range(mapping, offset, write_len, 0);
-	}
-
-	retval = mapping->a_ops->direct_IO(rw, iocb, iov,
-					offset, nr_segs);
-	if (rw == WRITE && mapping->nrpages) {
-		int err;
-		pgoff_t end = (offset + write_len - 1)
-					>> PAGE_CACHE_SHIFT;
-
-		/* O_DIRECT is allowed to drop i_mutex, so more data
-		 * could have been dirtied by others.  Start io one more
-		 * time
-		 */
-		err = filemap_fdatawrite_range(mapping, offset,
-		                               offset + write_len - 1);
-		if (!err)
-			err = invalidate_inode_pages2_range(mapping,
-					offset >> PAGE_CACHE_SHIFT, end);
-		if (err)
-			retval = err;
-	}
-	return retval;
+	return iocb->ki_filp->f_mapping->a_ops->direct_IO(rw, iocb, iov,
+							  offset, nr_segs);
 }
 
 /**



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 21:48 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 21:48 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:05 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 8 of 8] Avoid too many boundary buffers in DIO
X-Mercurial-Node: 9d3d4e0f01feadd0ef4bc077a61271f9e5a96a7b
Message-Id: <9d3d4e0f01feadd0ef4b.1166733304@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:35:04 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

Dave Chinner found a 10% performance regression with ext3 when using DIO
to fill holes instead of buffered IO.  On large IOs, the ext3 get_block
routine will send more than a page worth of blocks back to DIO via a
single buffer_head with a large b_size value.

The DIO code iterates through this massive block and tests for a
boundary buffer over and over again.  For every block size unit spanned
by the big map_bh, the boundary bit is tested and a bio may be forced
down to the block layer.

There are two potential fixes, one is to ignore the boundary bit on
large regions returned by the FS.  DIO can't tell which part of the big
region was a boundary, and so it may not be a good idea to trust the
hint.

This patch just clears the boundary bit after using it once.  It is 10%
faster for a streaming DIO write w/blocksize of 512k on my sata drive.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 3bd838f3dc06 -r 9d3d4e0f01fe fs/direct-io.c
--- a/fs/direct-io.c	Thu Dec 21 15:31:31 2006 -0500
+++ b/fs/direct-io.c	Thu Dec 21 15:31:31 2006 -0500
@@ -610,7 +610,6 @@ static int dio_new_bio(struct dio *dio, 
 	nr_pages = min(dio->pages_in_io, bio_get_nr_vecs(dio->map_bh.b_bdev));
 	BUG_ON(nr_pages <= 0);
 	ret = dio_bio_alloc(dio, dio->map_bh.b_bdev, sector, nr_pages);
-	dio->boundary = 0;
 out:
 	return ret;
 }
@@ -664,12 +663,6 @@ static int dio_send_cur_page(struct dio 
 		 */
 		if (dio->final_block_in_bio != dio->cur_page_block)
 			dio_bio_submit(dio);
-		/*
-		 * Submit now if the underlying fs is about to perform a
-		 * metadata read
-		 */
-		if (dio->boundary)
-			dio_bio_submit(dio);
 	}
 
 	if (dio->bio == NULL) {
@@ -686,6 +679,12 @@ static int dio_send_cur_page(struct dio 
 			BUG_ON(ret != 0);
 		}
 	}
+	/*
+	 * Submit now if the underlying fs is about to perform a
+	 * metadata read
+	 */
+	if (dio->boundary)
+		dio_bio_submit(dio);
 out:
 	return ret;
 }
@@ -712,6 +711,10 @@ submit_page_section(struct dio *dio, str
 		unsigned offset, unsigned len, sector_t blocknr)
 {
 	int ret = 0;
+	int boundary = dio->boundary;
+
+	/* don't let dio_send_cur_page do the boundary too soon */
+	dio->boundary = 0;
 
 	if (dio->rw & WRITE) {
 		/*
@@ -728,17 +731,7 @@ submit_page_section(struct dio *dio, str
 		(dio->cur_page_block +
 			(dio->cur_page_len >> dio->blkbits) == blocknr)) {
 		dio->cur_page_len += len;
-
-		/*
-		 * If dio->boundary then we want to schedule the IO now to
-		 * avoid metadata seeks.
-		 */
-		if (dio->boundary) {
-			ret = dio_send_cur_page(dio);
-			page_cache_release(dio->cur_page);
-			dio->cur_page = NULL;
-		}
-		goto out;
+		goto out_send;
 	}
 
 	/*
@@ -757,6 +750,18 @@ submit_page_section(struct dio *dio, str
 	dio->cur_page_offset = offset;
 	dio->cur_page_len = len;
 	dio->cur_page_block = blocknr;
+
+out_send:
+	/*
+	 * If dio->boundary then we want to schedule the IO now to
+	 * avoid metadata seeks.
+	 */
+	if (boundary) {
+		dio->boundary = 1;
+		ret = dio_send_cur_page(dio);
+		page_cache_release(dio->cur_page);
+		dio->cur_page = NULL;
+	}
 out:
 	return ret;
 }
@@ -962,7 +967,16 @@ do_holes:
 			this_chunk_bytes = this_chunk_blocks << blkbits;
 			BUG_ON(this_chunk_bytes == 0);
 
-			dio->boundary = buffer_boundary(map_bh);
+			/*
+			 * get_block may return more than one page worth
+			 * of blocks.  Make sure only the last io we
+			 * send down for this region is a boundary
+			 */
+			if (dio->blocks_available == this_chunk_blocks)
+				dio->boundary = buffer_boundary(map_bh);
+			else
+				dio->boundary = 0;
+
 			ret = submit_page_section(dio, page, offset_in_page,
 				this_chunk_bytes, dio->next_block_for_io);
 			if (ret) {



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 21:48 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 21:48 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:02 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 5 of 8] Make ext3 safe for the new DIO locking rules
X-Mercurial-Node: bebaf8972a3198faf661ab988af0f53cd49856bb
Message-Id: <bebaf8972a3198faf661.1166733301@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:35:01 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

This creates a version of ext3_get_block that starts and ends a transaction.

By starting and ending the transaction inside get_block, this is able to
avoid lock inversion problems when the DIO code tries to take page locks
inside blockdev_direct_IO. (transaction locks must always happen after
page locks).

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 385bc75d9266 -r bebaf8972a31 fs/ext3/inode.c
--- a/fs/ext3/inode.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/fs/ext3/inode.c	Thu Dec 21 15:31:30 2006 -0500
@@ -1673,6 +1673,30 @@ static int ext3_releasepage(struct page 
 	return journal_try_to_free_buffers(journal, page, wait);
 }
 
+static int ext3_get_block_direct_IO(struct inode *inode, sector_t iblock,
+			struct buffer_head *bh_result, int create)
+{
+	int ret = 0;
+	handle_t *handle = ext3_journal_start(inode, DIO_CREDITS);
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		goto out;
+	}
+	ret = ext3_get_block(inode, iblock, bh_result, create);
+	/*
+	 * Reacquire the handle: ext3_get_block() can restart the transaction
+	 */
+	handle = journal_current_handle();
+	if (handle) {
+		int err;
+		err = ext3_journal_stop(handle);
+		if (!ret)
+			ret = err;
+	}
+out:
+	return ret;
+}
+
 /*
  * If the O_DIRECT write will extend the file then add this inode to the
  * orphan list.  So recovery will truncate it back to the original size
@@ -1693,39 +1717,58 @@ static ssize_t ext3_direct_IO(int rw, st
 	int orphan = 0;
 	size_t count = iov_length(iov, nr_segs);
 
-	if (rw == WRITE) {
-		loff_t final_size = offset + count;
-
+	if (rw == WRITE && (offset + count > inode->i_size)) { 
 		handle = ext3_journal_start(inode, DIO_CREDITS);
 		if (IS_ERR(handle)) {
 			ret = PTR_ERR(handle);
 			goto out;
 		}
-		if (final_size > inode->i_size) {
-			ret = ext3_orphan_add(handle, inode);
-			if (ret)
-				goto out_stop;
-			orphan = 1;
-			ei->i_disksize = inode->i_size;
-		}
-	}
-
+		ret = ext3_orphan_add(handle, inode);
+		if (ret) {
+			ext3_journal_stop(handle);
+			goto out;
+		}
+		ei->i_disksize = inode->i_size;
+		ret = ext3_journal_stop(handle);
+		if (ret) {
+			/* something has gone horribly wrong, cleanup
+			 * the orphan list in ram
+			 */
+			if (inode->i_nlink)
+				ext3_orphan_del(NULL, inode);
+			goto out;
+		}
+		orphan = 1;
+	}
+
+	/*
+	 * the placeholder page code may take a page lock, so we have
+	 * to stop any running transactions before calling
+	 * blockdev_direct_IO.  Use ext3_get_block_direct_IO to start
+	 * and stop a transaction on each get_block call.
+	 */
 	ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
 				 offset, nr_segs,
-				 ext3_get_block, NULL);
+				 ext3_get_block_direct_IO, NULL);
 
 	/*
 	 * Reacquire the handle: ext3_get_block() can restart the transaction
 	 */
 	handle = journal_current_handle();
 
-out_stop:
-	if (handle) {
+	if (orphan) {
 		int err;
-
-		if (orphan && inode->i_nlink)
+		handle = ext3_journal_start(inode, DIO_CREDITS);
+		if (IS_ERR(handle)) {
+			ret = PTR_ERR(handle);
+			if (inode->i_nlink)
+				ext3_orphan_del(NULL, inode);
+			goto out;
+		}
+
+		if (inode->i_nlink)
 			ext3_orphan_del(handle, inode);
-		if (orphan && ret > 0) {
+		if (ret > 0) {
 			loff_t end = offset + ret;
 			if (end > inode->i_size) {
 				ei->i_disksize = end;



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 21:48 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 21:48 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:00 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 3 of 8] DIO: don't fall back to buffered writes
X-Mercurial-Node: ac51e7a4c7a66bc589e4e3640f5f822febab8be0
Message-Id: <ac51e7a4c7a66bc589e4.1166733299@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:34:59 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

Placeholder pages allow DIO to use locking rules similar to that of
writepage.  DIO can now fill holes, and it can extend the file via
get_block().

i_mutex can be dropped during writes if we are writing inside i_size.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 317779b11fe1 -r ac51e7a4c7a6 fs/direct-io.c
--- a/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
@@ -70,6 +70,7 @@ struct dio {
 	int rw;
 	loff_t i_size;			/* i_size when submitted */
 	int lock_type;			/* doesn't change */
+	int reacquire_i_mutex;		/* should we get i_mutex when done? */
 	unsigned blkbits;		/* doesn't change */
 	unsigned blkfactor;		/* When we're using an alignment which
 					   is finer than the filesystem's soft
@@ -218,8 +219,7 @@ static int lock_page_range(struct dio *d
 		return 0;
 	return find_or_insert_placeholders(mapping, dio->tmppages, start, end,
 	                                  ARRAY_SIZE(dio->tmppages),
-					  GFP_KERNEL,
-					  dio->rw == READ);
+					  GFP_KERNEL, 1);
 }
 
 
@@ -282,6 +282,8 @@ static int dio_complete(struct dio *dio,
 	unlock_page_range(dio, dio->fspages_start_off,
 			  dio->fspages_end_off - dio->fspages_start_off);
 	dio->fspages_end_off = dio->fspages_start_off;
+	if (dio->reacquire_i_mutex)
+		mutex_lock(&dio->inode->i_mutex);
 
 	if (ret == 0)
 		ret = dio->page_errors;
@@ -568,13 +570,8 @@ static int get_more_blocks(struct dio *d
 		map_bh->b_size = fs_count << dio->inode->i_blkbits;
 
 		create = dio->rw & WRITE;
-		if (dio->lock_type == DIO_LOCKING) {
-			if (dio->block_in_file < (i_size_read(dio->inode) >>
-							dio->blkbits))
-				create = 0;
-		} else if (dio->lock_type == DIO_NO_LOCKING) {
+		if (dio->lock_type == DIO_NO_LOCKING)
 			create = 0;
-		}
 	        index = fs_startblk >> (PAGE_CACHE_SHIFT -
 		                        dio->inode->i_blkbits);
 		end = (dio->final_block_in_request >> dio->blkfactor) >>
@@ -1258,6 +1255,13 @@ __blockdev_direct_IO(int rw, struct kioc
 	dio->is_async = !is_sync_kiocb(iocb) && !((rw & WRITE) &&
 		(end > i_size_read(inode)));
 
+	/* if our write is inside i_size, we can drop i_mutex */
+	dio->reacquire_i_mutex = 0;
+	if ((rw & WRITE) && dio_lock_type == DIO_LOCKING &&
+	   end <= i_size_read(inode) && is_sync_kiocb(iocb)) {
+		dio->reacquire_i_mutex = 1;
+		mutex_unlock(&inode->i_mutex);
+	}
 	retval = direct_io_worker(rw, iocb, inode, iov, offset,
 				nr_segs, blkbits, get_block, end_io, dio);
 out:
diff -r 317779b11fe1 -r ac51e7a4c7a6 mm/filemap.c
--- a/mm/filemap.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/mm/filemap.c	Thu Dec 21 15:31:30 2006 -0500
@@ -2865,10 +2865,19 @@ generic_file_direct_IO(int rw, struct ki
 	retval = mapping->a_ops->direct_IO(rw, iocb, iov,
 					offset, nr_segs);
 	if (rw == WRITE && mapping->nrpages) {
+		int err;
 		pgoff_t end = (offset + write_len - 1)
 					>> PAGE_CACHE_SHIFT;
-		int err = invalidate_inode_pages2_range(mapping,
-				offset >> PAGE_CACHE_SHIFT, end);
+
+		/* O_DIRECT is allowed to drop i_mutex, so more data
+		 * could have been dirtied by others.  Start io one more
+		 * time
+		 */
+		err = filemap_fdatawrite_range(mapping, offset,
+		                               offset + write_len - 1);
+		if (!err)
+			err = invalidate_inode_pages2_range(mapping,
+					offset >> PAGE_CACHE_SHIFT, end);
 		if (err)
 			retval = err;
 	}



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 21:48 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 21:48 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:04 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 7 of 8] Adapt XFS to the new blockdev_direct_IO calls
X-Mercurial-Node: 3bd838f3dc060101c95ec82e6f66478a443120a7
Message-Id: <3bd838f3dc060101c95e.1166733303@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:35:03 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

XFS is changed to use blockdev_direct_IO flags instead of DIO_OWN_LOCKING.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 5a06df98f46d -r 3bd838f3dc06 fs/xfs/linux-2.6/xfs_aops.c
--- a/fs/xfs/linux-2.6/xfs_aops.c	Thu Dec 21 15:31:31 2006 -0500
+++ b/fs/xfs/linux-2.6/xfs_aops.c	Thu Dec 21 15:31:31 2006 -0500
@@ -1392,19 +1392,16 @@ xfs_vm_direct_IO(
 
 	iocb->private = xfs_alloc_ioend(inode, IOMAP_UNWRITTEN);
 
-	if (rw == WRITE) {
-		ret = blockdev_direct_IO_own_locking(rw, iocb, inode,
-			iomap.iomap_target->bt_bdev,
-			iov, offset, nr_segs,
-			xfs_get_blocks_direct,
-			xfs_end_io_direct);
-	} else {
-		ret = blockdev_direct_IO_no_locking(rw, iocb, inode,
-			iomap.iomap_target->bt_bdev,
-			iov, offset, nr_segs,
-			xfs_get_blocks_direct,
-			xfs_end_io_direct);
-	}
+	/*
+	 * ask DIO not to do any special locking for us, and to always
+	 * pass create=1 to get_block on writes
+	 */
+	ret = blockdev_direct_IO_flags(rw, iocb, inode,
+				       iomap.iomap_target->bt_bdev,
+				       iov, offset, nr_segs,
+				       xfs_get_blocks_direct,
+				       xfs_end_io_direct,
+				       DIO_CREATE);
 
 	if (unlikely(ret != -EIOCBQUEUED && iocb->private))
 		xfs_destroy_ioend(iocb->private);



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 21:48 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 21:48 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:34:57 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 0 of 8] O_DIRECT locking rework v3
Message-Id: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:34:56 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

[ resend, sorry for the last set of messed up headers ]

I took a small detour on the O_DIRECT locking rework to look at
different alternatives for range locking in the pagecache.  After
benchmarking a few different types of trees, I didn't find anything that
would match radix for random lookup performance.

This patchset does ranges in the radix tree by inserting a placeholder
at the last slot in the range and forcing all lookups to search forward.
It means radix_tree_gang_lookup must be used instead of
radix_tree_lookup, but this is still faster for random searches than
anything else I tried.

A bit is set on the radix root node to indicate if range searching is
required.  So, when O_DIRECT isn't used or O_DIRECT is used for tiny
ios, no range lookups are done.

With O_DIRECT in use, only a single placeholder is inserted
to lock down the entire range for a given IO.  This should
stand up pretty well for those monster XFS workloads.

If the mapping has pages on it, I do one placeholder for every 64k
to limit the number of pages pinned down during the IO.  There's
lots of hand waving here, it may be best to get rid of this special
case.

Patch against Linus' git from today.  Testing has been light, I'm
mostly looking for comments on the range locking tricks.

-chris



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 21:48 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 21:48 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:34:58 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 1 of 8] Introduce a place holder page for the pagecache
X-Mercurial-Node: 4cac7e560b5342c0e5e2c45b2e036a936adedc2e
Message-Id: <4cac7e560b5342c0e5e2.1166733297@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:34:57 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

mm/filemap.c is changed to wait on these before adding a page into the page
cache, and truncates are changed to wait for all of the place holder pages to
disappear.

Place holder pages can only be examined with the mapping lock held.  They
cannot be locked, and cannot have references increased or decreased on them.

Placeholders can span a range bigger than one page.  The placeholder is
inserted into the radix slot for the end of the range, and the flags field in
the page struct is used to record the start of the range.

A bit is added for the radix root (PAGECACHE_TAG_EXTENTS), and when
mm/filemap.c finds that bit set, searches for an index in the pagecache
look forward to find any placeholders that index may intersect.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 511f067627ac -r 4cac7e560b53 drivers/mtd/devices/block2mtd.c
--- a/drivers/mtd/devices/block2mtd.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/drivers/mtd/devices/block2mtd.c	Thu Dec 21 15:31:30 2006 -0500
@@ -66,7 +66,7 @@ static void cache_readahead(struct addre
 			INFO("Overrun end of disk in cache readahead\n");
 			break;
 		}
-		page = radix_tree_lookup(&mapping->page_tree, pagei);
+		page = radix_tree_lookup_extent(&mapping->page_tree, pagei);
 		if (page && (!i))
 			break;
 		if (page)
diff -r 511f067627ac -r 4cac7e560b53 include/linux/fs.h
--- a/include/linux/fs.h	Thu Dec 21 00:20:01 2006 -0800
+++ b/include/linux/fs.h	Thu Dec 21 15:31:30 2006 -0500
@@ -489,6 +489,11 @@ struct block_device {
  */
 #define PAGECACHE_TAG_DIRTY	0
 #define PAGECACHE_TAG_WRITEBACK	1
+
+/*
+ * This tag is only valid on the root of the radix tree
+ */
+#define PAGE_CACHE_TAG_EXTENTS 2
 
 int mapping_tagged(struct address_space *mapping, int tag);
 
diff -r 511f067627ac -r 4cac7e560b53 include/linux/page-flags.h
--- a/include/linux/page-flags.h	Thu Dec 21 00:20:01 2006 -0800
+++ b/include/linux/page-flags.h	Thu Dec 21 15:31:30 2006 -0500
@@ -267,4 +267,6 @@ static inline void set_page_writeback(st
 	test_set_page_writeback(page);
 }
 
+void set_page_placeholder(struct page *page, pgoff_t start, pgoff_t end);
+
 #endif	/* PAGE_FLAGS_H */
diff -r 511f067627ac -r 4cac7e560b53 include/linux/pagemap.h
--- a/include/linux/pagemap.h	Thu Dec 21 00:20:01 2006 -0800
+++ b/include/linux/pagemap.h	Thu Dec 21 15:31:30 2006 -0500
@@ -76,6 +76,11 @@ extern struct page * find_get_page(struc
 				unsigned long index);
 extern struct page * find_lock_page(struct address_space *mapping,
 				unsigned long index);
+int find_or_insert_placeholders(struct address_space *mapping,
+                                  struct page **pages, unsigned long start,
+                                  unsigned long end, unsigned long nr,
+                                  gfp_t gfp_mask,
+                                  int wait);
 extern __deprecated_for_modules struct page * find_trylock_page(
 			struct address_space *mapping, unsigned long index);
 extern struct page * find_or_create_page(struct address_space *mapping,
@@ -86,6 +91,15 @@ unsigned find_get_pages_contig(struct ad
 			       unsigned int nr_pages, struct page **pages);
 unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
 			int tag, unsigned int nr_pages, struct page **pages);
+void remove_placeholder_pages(struct address_space *mapping,
+                             struct page **pages,
+                             unsigned long offset,
+                             unsigned long end,
+                             unsigned long nr);
+void wake_up_placeholder_page(struct page *page);
+void wait_on_placeholder_pages_range(struct address_space *mapping, pgoff_t start,
+			       pgoff_t end);
+
 
 /*
  * Returns locked page at given index in given cache, creating it if needed.
@@ -116,6 +130,8 @@ int add_to_page_cache_lru(struct page *p
 				unsigned long index, gfp_t gfp_mask);
 extern void remove_from_page_cache(struct page *page);
 extern void __remove_from_page_cache(struct page *page);
+struct page *radix_tree_lookup_extent(struct radix_tree_root *root,
+					     unsigned long index);
 
 /*
  * Return byte-offset into filesystem object for page.
diff -r 511f067627ac -r 4cac7e560b53 include/linux/radix-tree.h
--- a/include/linux/radix-tree.h	Thu Dec 21 00:20:01 2006 -0800
+++ b/include/linux/radix-tree.h	Thu Dec 21 15:31:30 2006 -0500
@@ -53,6 +53,7 @@ static inline int radix_tree_is_direct_p
 /*** radix-tree API starts here ***/
 
 #define RADIX_TREE_MAX_TAGS 2
+#define RADIX_TREE_MAX_ROOT_TAGS 3
 
 /* root tags are stored in gfp_mask, shifted by __GFP_BITS_SHIFT */
 struct radix_tree_root {
@@ -168,6 +169,7 @@ radix_tree_gang_lookup_tag(struct radix_
 		unsigned long first_index, unsigned int max_items,
 		unsigned int tag);
 int radix_tree_tagged(struct radix_tree_root *root, unsigned int tag);
+void radix_tree_root_tag_set(struct radix_tree_root *root, unsigned int tag);
 
 static inline void radix_tree_preload_end(void)
 {
diff -r 511f067627ac -r 4cac7e560b53 lib/radix-tree.c
--- a/lib/radix-tree.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/lib/radix-tree.c	Thu Dec 21 15:31:30 2006 -0500
@@ -468,6 +468,12 @@ void *radix_tree_tag_set(struct radix_tr
 	return slot;
 }
 EXPORT_SYMBOL(radix_tree_tag_set);
+
+void radix_tree_root_tag_set(struct radix_tree_root *root, unsigned int tag)
+{
+	root_tag_set(root, tag);
+}
+EXPORT_SYMBOL(radix_tree_root_tag_set);
 
 /**
  *	radix_tree_tag_clear - clear a tag on a radix tree node
diff -r 511f067627ac -r 4cac7e560b53 mm/filemap.c
--- a/mm/filemap.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/mm/filemap.c	Thu Dec 21 15:31:30 2006 -0500
@@ -44,6 +44,14 @@ generic_file_direct_IO(int rw, struct ki
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	loff_t offset, unsigned long nr_segs);
 
+static wait_queue_head_t *page_waitqueue(struct page *page);
+static void wait_on_placeholder_page(struct address_space *mapping,
+				     struct page *page,
+				     int write_lock);
+
+static struct address_space placeholder_address_space;
+#define PagePlaceHolder(page) ((page)->mapping == &placeholder_address_space)
+
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
  * though.
@@ -421,6 +429,41 @@ int filemap_write_and_wait_range(struct 
 	return err;
 }
 
+/*
+ * When the radix tree has the extent bit set, a lookup needs to search
+ * forward in the tree to find any extent the index might intersect.
+ * When extents are off, a faster radix_tree_lookup can be done instead.
+ *
+ * This does the appropriate lookup based on the PAGE_CACHE_TAG_EXTENTS
+ * bit on the root node
+ */
+struct page *radix_tree_lookup_extent(struct radix_tree_root *root,
+					     unsigned long index)
+{
+	if (radix_tree_tagged(root, PAGE_CACHE_TAG_EXTENTS)) {
+		struct page *p;
+		unsigned int found;
+		found = radix_tree_gang_lookup(root, (void **)(&p), index, 1);
+		if (found) {
+			if (PagePlaceHolder(p)) {
+				pgoff_t start = p->flags;
+				pgoff_t end = p->index;
+				if (end >= index && start <= index)
+					return p;
+				return NULL;
+			} else {
+				if (p->index == index) {
+					return p;
+				}
+				return NULL;
+			}
+		} else
+			return NULL;
+	}
+	return radix_tree_lookup(root, index);
+}
+
+
 /**
  * add_to_page_cache - add newly allocated pagecache pages
  * @page:	page to add
@@ -437,12 +480,38 @@ int add_to_page_cache(struct page *page,
 int add_to_page_cache(struct page *page, struct address_space *mapping,
 		pgoff_t offset, gfp_t gfp_mask)
 {
-	int error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
+	int error;
+again:
+	error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
 
 	if (error == 0) {
+		struct page *tmp;
 		write_lock_irq(&mapping->tree_lock);
+		/*
+		 * If extents are on for this radix tree, we have to do
+		 * the more expensive search for an overlapping extent
+		 * before we try to insert.
+		 */
+		if (radix_tree_tagged(&mapping->page_tree,
+				      PAGE_CACHE_TAG_EXTENTS)) {
+			tmp = radix_tree_lookup_extent(&mapping->page_tree,
+						       offset);
+			if (tmp && PagePlaceHolder(tmp))
+				goto exists;
+		}
 		error = radix_tree_insert(&mapping->page_tree, offset, page);
-		if (!error) {
+		if (error == -EEXIST && (gfp_mask & __GFP_WAIT)) {
+			tmp = radix_tree_lookup_extent(&mapping->page_tree,
+						       offset);
+			if (tmp && PagePlaceHolder(tmp)) {
+exists:
+				radix_tree_preload_end();
+				/* drops the lock */
+				wait_on_placeholder_page(mapping, tmp, 1);
+				goto again;
+			}
+		}
+		if (!error && !PagePlaceHolder(page)) {
 			page_cache_get(page);
 			SetPageLocked(page);
 			page->mapping = mapping;
@@ -516,6 +585,92 @@ void fastcall wait_on_page_bit(struct pa
 }
 EXPORT_SYMBOL(wait_on_page_bit);
 
+/*
+ * Call with either a read lock or a write lock on the mapping tree.
+ *
+ * placeholder pages can't be tested or checked without the tree lock held
+ *
+ * In order to wait for the placeholders without losing a wakeup from someone
+ * removing them, we have to prepare_to_wait before dropping the tree lock.
+ *
+ * The lock is dropped just before waiting for the place holder.  It is not
+ * retaken before returning.
+ */
+static void wait_on_placeholder_page(struct address_space *mapping,
+				     struct page *page,
+				     int write_lock)
+{
+	DEFINE_WAIT(wait);
+	wait_queue_head_t *wqh = page_waitqueue(page);
+	prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
+	if (write_lock)
+		write_unlock_irq(&mapping->tree_lock);
+	else
+		read_unlock_irq(&mapping->tree_lock);
+	io_schedule();
+	finish_wait(wqh, &wait);
+}
+
+void wake_up_placeholder_page(struct page *page)
+{
+	__wake_up_bit(page_waitqueue(page), &page->flags, PG_locked);
+}
+EXPORT_SYMBOL_GPL(wake_up_placeholder_page);
+
+/**
+ * wait_on_placeholder_pages - gang placeholder page waiter
+ * @mapping:	The address_space to search
+ * @start:	The starting page index
+ * @end:	The max page index (inclusive)
+ *
+ * wait_on_placeholder_pages() will search for and wait on a range of pages
+ * in the mapping
+ *
+ * On return, the range has no placeholder pages sitting in it.
+ */
+void wait_on_placeholder_pages_range(struct address_space *mapping,
+			       pgoff_t start, pgoff_t end)
+{
+	unsigned int i;
+	unsigned int ret;
+	struct page *pages[8];
+	pgoff_t cur = start;
+	pgoff_t highest = start;
+
+	/*
+	 * we expect a very small number of place holder pages, so
+	 * this code isn't trying to be very fast.
+	 */
+again:
+	read_lock_irq(&mapping->tree_lock);
+	ret = radix_tree_gang_lookup(&mapping->page_tree,
+				(void **)pages, cur, ARRAY_SIZE(pages));
+	for (i = 0; i < ret; i++) {
+		if (PagePlaceHolder(pages[i])) {
+			if (pages[i]->flags > end)
+				goto done;
+			/* drops the lock */
+			wait_on_placeholder_page(mapping, pages[i], 0);
+			goto again;
+		}
+		if (pages[i]->index > highest)
+			highest = pages[i]->index;
+		if (pages[i]->index > end)
+			goto done;
+	}
+	if (highest < end && ret == ARRAY_SIZE(pages)) {
+		cur = highest;
+		if (need_resched()) {
+			read_unlock_irq(&mapping->tree_lock);
+			cond_resched();
+		}
+		goto again;
+	}
+done:
+	read_unlock_irq(&mapping->tree_lock);
+}
+EXPORT_SYMBOL_GPL(wait_on_placeholder_pages_range);
+
 /**
  * unlock_page - unlock a locked page
  * @page: the page
@@ -532,6 +687,7 @@ EXPORT_SYMBOL(wait_on_page_bit);
  */
 void fastcall unlock_page(struct page *page)
 {
+	BUG_ON(PagePlaceHolder(page));
 	smp_mb__before_clear_bit();
 	if (!TestClearPageLocked(page))
 		BUG();
@@ -568,6 +724,7 @@ void fastcall __lock_page(struct page *p
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	BUG_ON(PagePlaceHolder(page));
 	__wait_on_bit_lock(page_waitqueue(page), &wait, sync_page,
 							TASK_UNINTERRUPTIBLE);
 }
@@ -580,6 +737,7 @@ void fastcall __lock_page_nosync(struct 
 void fastcall __lock_page_nosync(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
+	BUG_ON(PagePlaceHolder(page));
 	__wait_on_bit_lock(page_waitqueue(page), &wait, __sleep_on_page_lock,
 							TASK_UNINTERRUPTIBLE);
 }
@@ -597,13 +755,281 @@ struct page * find_get_page(struct addre
 	struct page *page;
 
 	read_lock_irq(&mapping->tree_lock);
-	page = radix_tree_lookup(&mapping->page_tree, offset);
-	if (page)
-		page_cache_get(page);
+	page = radix_tree_lookup_extent(&mapping->page_tree, offset);
+	if (page) {
+		if (PagePlaceHolder(page))
+			page = NULL;
+		else
+			page_cache_get(page);
+	}
 	read_unlock_irq(&mapping->tree_lock);
 	return page;
 }
 EXPORT_SYMBOL(find_get_page);
+
+/**
+ * remove_placeholder_pages - remove a range of placeholder or locked pages
+ * @mapping: the page's address_space
+ * @pages: an array of page pointers to use for gang looukps
+ * @placeholder: the placeholder page previously inserted (for verification)
+ * @start: the search starting point
+ * @end: the search end point (offsets >= end are not touched)
+ * @nr: the size of the pages array.
+ *
+ * Any placeholder pages in the range specified are removed.  Any real
+ * pages are unlocked and released.
+ */
+void remove_placeholder_pages(struct address_space *mapping,
+			     struct page **pages,
+			     unsigned long start,
+			     unsigned long end,
+			     unsigned long nr)
+{
+	struct page *page;
+	int ret;
+	int i;
+	unsigned long num;
+
+	write_lock_irq(&mapping->tree_lock);
+	while (start < end) {
+		num = min(nr, end-start);
+		ret = radix_tree_gang_lookup(&mapping->page_tree,
+						(void **)pages, start, num);
+		for (i = 0; i < ret; i++) {
+			page = pages[i];
+			if (PagePlaceHolder(page)) {
+				if (page->index >= end)
+					break;
+				radix_tree_delete(&mapping->page_tree,
+						  page->index);
+				start = page->index + 1;
+				wake_up_placeholder_page(page);
+				kfree(page);
+			} else {
+				if (page->index >= end)
+					break;
+				unlock_page(page);
+				page_cache_release(page);
+				start = page->index + 1;
+			}
+		}
+		if (need_resched()) {
+			write_unlock_irq(&mapping->tree_lock);
+			cond_resched();
+			write_lock_irq(&mapping->tree_lock);
+		}
+	}
+	write_unlock_irq(&mapping->tree_lock);
+}
+EXPORT_SYMBOL_GPL(remove_placeholder_pages);
+
+/*
+ * a helper function to insert a placeholder into multiple slots
+ * in the radix tree.  This could probably use an optimized version
+ * in the radix code.  It may insert fewer than the request number
+ * of placeholders if we need to reschedule or the radix tree needs to
+ * be preloaded again.
+ *
+ * returns zero on error or the number actually inserted.
+ */
+static int insert_placeholder(struct address_space *mapping,
+					 struct page *insert)
+{
+	int err;
+	unsigned int found;
+	struct page *debug_page;
+	/* sanity check, make sure other extents don't exist in this range */
+	found = radix_tree_gang_lookup(&mapping->page_tree,
+				    (void **)(&debug_page),
+				    insert->flags, 1);
+	BUG_ON(found > 0 && debug_page->flags <= (insert->index));
+	err = radix_tree_insert(&mapping->page_tree, insert->index, insert);
+	return err;
+}
+
+
+static struct page *alloc_placeholder(gfp_t gfp_mask)
+{
+	struct page *p;
+	p = kmalloc(sizeof(*p), gfp_mask);
+	if (p) {
+		memset(p, 0, sizeof(*p));
+		p->mapping = &placeholder_address_space;
+	}
+	return p;
+}
+
+/**
+ * find_or_insert_placeholders - locate a group of pagecache pages or insert one
+ * @mapping: the page's address_space
+ * @pages: an array of page pointers to use for gang looukps
+ * @start: the search starting point
+ * @end: the search end point (offsets >= end are not touched)
+ * @nr: the size of the pages array.
+ * @gfp_mask: page allocation mode
+ * @insert: the page to insert if none is found
+ * @iowait: 1 if you want to wait for dirty or writeback pages.
+ *
+ * This locks down a range of offsets in the address space.  Any pages
+ * already present are locked and a placeholder page is inserted into
+ * the radix tree for any offsets without pages.
+ */
+int find_or_insert_placeholders(struct address_space *mapping,
+				  struct page **pages, unsigned long start,
+				  unsigned long end, unsigned long nr,
+				  gfp_t gfp_mask,
+				  int iowait)
+{
+	int err = 0;
+	int i, ret;
+	unsigned long cur = start;
+	struct page *page;
+	int restart;
+	struct page *insert = NULL;
+	/*
+	 * this gets complicated.  Placeholders and page locks need to
+	 * be taken in order.  We use gang lookup to cut down on the cpu
+	 * cost, but we need to keep track of holes in the results and
+	 * insert placeholders as appropriate.
+	 *
+	 * If a locked page or a placeholder is found, we wait for it and
+	 * pick up where we left off.  If a dirty or PG_writeback page is found
+	 * and iowait==1, we have to drop all of our locks, kick/wait for the
+	 * io and resume again.
+	 */
+repeat:
+	if (!insert) {
+		insert = alloc_placeholder(gfp_mask);
+		if (!insert) {
+			err = -ENOMEM;
+			goto fail;
+		}
+	}
+	if (cur != start )
+		cond_resched();
+	err = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
+	if (err)
+		goto fail;
+	write_lock_irq(&mapping->tree_lock);
+
+	/* only set the extent tag if we are inserting placeholders for more
+	 * than one page worth of slots.  This way small random ios don't
+	 * suffer from slower lookups.
+	 */
+	if (cur == start && end - start > 1)
+		radix_tree_root_tag_set(&mapping->page_tree,
+					PAGE_CACHE_TAG_EXTENTS);
+repeat_lock:
+	ret = radix_tree_gang_lookup(&mapping->page_tree,
+					(void **)pages, cur,
+					min(nr, end-cur));
+	for (i = 0 ; i < ret ; i++) {
+		restart = 0;
+		page = pages[i];
+
+		if (PagePlaceHolder(page) && page->flags < end) {
+			radix_tree_preload_end();
+			/* drops the lock */
+			wait_on_placeholder_page(mapping, page, 1);
+			goto repeat;
+		}
+
+		if (page->index > cur) {
+			unsigned long top = min(end, page->index);
+			insert->index = top - 1;
+			insert->flags = cur;
+			err = insert_placeholder(mapping, insert);
+			write_unlock_irq(&mapping->tree_lock);
+			radix_tree_preload_end();
+			insert = NULL;
+			if (err)
+				goto fail;
+			cur = top;
+			if (cur < end)
+				goto repeat;
+			else
+				goto done;
+		}
+		if (page->index >= end) {
+			ret = 0;
+			break;
+		}
+		page_cache_get(page);
+		BUG_ON(page->index != cur);
+		BUG_ON(PagePlaceHolder(page));
+		if (TestSetPageLocked(page)) {
+			unsigned long tmpoff = page->index;
+			page_cache_get(page);
+			write_unlock_irq(&mapping->tree_lock);
+			radix_tree_preload_end();
+			__lock_page(page);
+			/* Has the page been truncated while we slept? */
+			if (unlikely(page->mapping != mapping ||
+				     page->index != tmpoff)) {
+				unlock_page(page);
+				page_cache_release(page);
+				goto repeat;
+			} else {
+				/* we've locked the page, but  we need to
+				 *  check it for dirty/writeback
+				 */
+				restart = 1;
+			}
+		}
+		if (iowait && (PageDirty(page) || PageWriteback(page))) {
+			unlock_page(page);
+			page_cache_release(page);
+			if (!restart) {
+				write_unlock_irq(&mapping->tree_lock);
+				radix_tree_preload_end();
+			}
+			err = filemap_write_and_wait_range(mapping,
+						 cur << PAGE_CACHE_SHIFT,
+						 end << PAGE_CACHE_SHIFT);
+			if (err)
+				goto fail;
+			goto repeat;
+		}
+		cur++;
+		if (restart)
+			goto repeat;
+		if (cur >= end)
+			break;
+	}
+
+	/* we haven't yet filled the range */
+	if (cur < end) {
+		/* if the search filled our array, there is more to do. */
+		if (ret && ret == nr)
+			goto repeat_lock;
+
+		/* otherwise insert placeholders for the remaining offsets */
+		insert->index = end - 1;
+		insert->flags = cur;
+		err = insert_placeholder(mapping, insert);
+		write_unlock_irq(&mapping->tree_lock);
+		radix_tree_preload_end();
+		if (err)
+			goto fail;
+		insert = NULL;
+		cur = end;
+	} else {
+		write_unlock_irq(&mapping->tree_lock);
+		radix_tree_preload_end();
+	}
+done:
+	BUG_ON(cur < end);
+	BUG_ON(cur > end);
+	if (insert)
+		kfree(insert);
+	return err;
+fail:
+	remove_placeholder_pages(mapping, pages, start, cur, nr);
+	if (insert)
+		kfree(insert);
+	return err;
+}
+EXPORT_SYMBOL_GPL(find_or_insert_placeholders);
 
 /**
  * find_trylock_page - find and lock a page
@@ -617,8 +1043,8 @@ struct page *find_trylock_page(struct ad
 	struct page *page;
 
 	read_lock_irq(&mapping->tree_lock);
-	page = radix_tree_lookup(&mapping->page_tree, offset);
-	if (page && TestSetPageLocked(page))
+	page = radix_tree_lookup_extent(&mapping->page_tree, offset);
+	if (page && (PagePlaceHolder(page) || TestSetPageLocked(page)))
 		page = NULL;
 	read_unlock_irq(&mapping->tree_lock);
 	return page;
@@ -642,8 +1068,14 @@ struct page *find_lock_page(struct addre
 
 	read_lock_irq(&mapping->tree_lock);
 repeat:
-	page = radix_tree_lookup(&mapping->page_tree, offset);
+	page = radix_tree_lookup_extent(&mapping->page_tree, offset);
 	if (page) {
+		if (PagePlaceHolder(page)) {
+			/* drops the lock */
+			wait_on_placeholder_page(mapping, page, 0);
+			read_lock_irq(&mapping->tree_lock);
+			goto repeat;
+		}
 		page_cache_get(page);
 		if (TestSetPageLocked(page)) {
 			read_unlock_irq(&mapping->tree_lock);
@@ -727,14 +1159,25 @@ unsigned find_get_pages(struct address_s
 unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
 			    unsigned int nr_pages, struct page **pages)
 {
-	unsigned int i;
+	unsigned int i = 0;
 	unsigned int ret;
 
 	read_lock_irq(&mapping->tree_lock);
 	ret = radix_tree_gang_lookup(&mapping->page_tree,
 				(void **)pages, start, nr_pages);
-	for (i = 0; i < ret; i++)
-		page_cache_get(pages[i]);
+	while(i < ret) {
+		if (PagePlaceHolder(pages[i])) {
+			/* we can't return a place holder, shift it away */
+			if (i + 1 < ret) {
+				memcpy(&pages[i], &pages[i+1],
+		                       (ret - i - 1) * sizeof(struct page *));
+			}
+			ret--;
+			continue;
+		} else
+			page_cache_get(pages[i]);
+		i++;
+	}
 	read_unlock_irq(&mapping->tree_lock);
 	return ret;
 }
@@ -761,6 +1204,8 @@ unsigned find_get_pages_contig(struct ad
 	ret = radix_tree_gang_lookup(&mapping->page_tree,
 				(void **)pages, index, nr_pages);
 	for (i = 0; i < ret; i++) {
+		if (PagePlaceHolder(pages[i]))
+			break;
 		if (pages[i]->mapping == NULL || pages[i]->index != index)
 			break;
 
@@ -785,14 +1230,25 @@ unsigned find_get_pages_tag(struct addre
 unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
 			int tag, unsigned int nr_pages, struct page **pages)
 {
-	unsigned int i;
+	unsigned int i = 0;
 	unsigned int ret;
 
 	read_lock_irq(&mapping->tree_lock);
 	ret = radix_tree_gang_lookup_tag(&mapping->page_tree,
 				(void **)pages, *index, nr_pages, tag);
-	for (i = 0; i < ret; i++)
-		page_cache_get(pages[i]);
+	while(i < ret) {
+		if (PagePlaceHolder(pages[i])) {
+			/* we can't return a place holder, shift it away */
+			if (i + 1 < ret) {
+				memcpy(&pages[i], &pages[i+1],
+		                       (ret - i - 1) * sizeof(struct page *));
+			}
+			ret--;
+			continue;
+		} else
+			page_cache_get(pages[i]);
+		i++;
+	}
 	if (ret)
 		*index = pages[ret - 1]->index + 1;
 	read_unlock_irq(&mapping->tree_lock);
@@ -2406,18 +2862,15 @@ generic_file_direct_IO(int rw, struct ki
 			unmap_mapping_range(mapping, offset, write_len, 0);
 	}
 
-	retval = filemap_write_and_wait(mapping);
-	if (retval == 0) {
-		retval = mapping->a_ops->direct_IO(rw, iocb, iov,
-						offset, nr_segs);
-		if (rw == WRITE && mapping->nrpages) {
-			pgoff_t end = (offset + write_len - 1)
-						>> PAGE_CACHE_SHIFT;
-			int err = invalidate_inode_pages2_range(mapping,
-					offset >> PAGE_CACHE_SHIFT, end);
-			if (err)
-				retval = err;
-		}
+	retval = mapping->a_ops->direct_IO(rw, iocb, iov,
+					offset, nr_segs);
+	if (rw == WRITE && mapping->nrpages) {
+		pgoff_t end = (offset + write_len - 1)
+					>> PAGE_CACHE_SHIFT;
+		int err = invalidate_inode_pages2_range(mapping,
+				offset >> PAGE_CACHE_SHIFT, end);
+		if (err)
+			retval = err;
 	}
 	return retval;
 }
diff -r 511f067627ac -r 4cac7e560b53 mm/migrate.c
--- a/mm/migrate.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/mm/migrate.c	Thu Dec 21 15:31:30 2006 -0500
@@ -305,8 +305,12 @@ static int migrate_page_move_mapping(str
 
 	write_lock_irq(&mapping->tree_lock);
 
+	/*
+	 * we don't need to worry about placeholders here,
+	 * the slot in the tree is verified
+	 */
 	pslot = radix_tree_lookup_slot(&mapping->page_tree,
- 					page_index(page));
+					page_index(page));
 
 	if (page_count(page) != 2 + !!PagePrivate(page) ||
 			(struct page *)radix_tree_deref_slot(pslot) != page) {
diff -r 511f067627ac -r 4cac7e560b53 mm/readahead.c
--- a/mm/readahead.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/mm/readahead.c	Thu Dec 21 15:31:30 2006 -0500
@@ -288,7 +288,8 @@ __do_page_cache_readahead(struct address
 		if (page_offset > end_index)
 			break;
 
-		page = radix_tree_lookup(&mapping->page_tree, page_offset);
+		page = radix_tree_lookup_extent(&mapping->page_tree,
+						page_offset);
 		if (page)
 			continue;
 
diff -r 511f067627ac -r 4cac7e560b53 mm/truncate.c
--- a/mm/truncate.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/mm/truncate.c	Thu Dec 21 15:31:30 2006 -0500
@@ -209,6 +209,7 @@ void truncate_inode_pages_range(struct a
 		}
 		pagevec_release(&pvec);
 	}
+	wait_on_placeholder_pages_range(mapping, start, end);
 }
 EXPORT_SYMBOL(truncate_inode_pages_range);
 



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 20:56 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 20:56 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:03 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 6 of 8] Make reiserfs safe for new DIO locking rules
X-Mercurial-Node: 5a06df98f46d0b2d44421f92467cbb25812f6677
Message-Id: <5a06df98f46d0b2d4442.1166733302@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:35:02 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

reiserfs is changed to use a version of reiserfs_get_block that is safe
for filling holes without i_mutex held.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r bebaf8972a31 -r 5a06df98f46d fs/reiserfs/inode.c
--- a/fs/reiserfs/inode.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/fs/reiserfs/inode.c	Thu Dec 21 15:31:31 2006 -0500
@@ -469,7 +469,8 @@ static int reiserfs_get_blocks_direct_io
 	bh_result->b_size = (1 << inode->i_blkbits);
 
 	ret = reiserfs_get_block(inode, iblock, bh_result,
-				 create | GET_BLOCK_NO_DANGLE);
+				 create | GET_BLOCK_NO_DANGLE |
+				 GET_BLOCK_NO_IMUX);
 	if (ret)
 		goto out;
 



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 20:56 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 20:56 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:34:58 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 1 of 8] Introduce a place holder page for the pagecache
X-Mercurial-Node: 4cac7e560b5342c0e5e2c45b2e036a936adedc2e
Message-Id: <4cac7e560b5342c0e5e2.1166733297@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:34:57 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

mm/filemap.c is changed to wait on these before adding a page into the page
cache, and truncates are changed to wait for all of the place holder pages to
disappear.

Place holder pages can only be examined with the mapping lock held.  They
cannot be locked, and cannot have references increased or decreased on them.

Placeholders can span a range bigger than one page.  The placeholder is
inserted into the radix slot for the end of the range, and the flags field in
the page struct is used to record the start of the range.

A bit is added for the radix root (PAGECACHE_TAG_EXTENTS), and when
mm/filemap.c finds that bit set, searches for an index in the pagecache
look forward to find any placeholders that index may intersect.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 511f067627ac -r 4cac7e560b53 drivers/mtd/devices/block2mtd.c
--- a/drivers/mtd/devices/block2mtd.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/drivers/mtd/devices/block2mtd.c	Thu Dec 21 15:31:30 2006 -0500
@@ -66,7 +66,7 @@ static void cache_readahead(struct addre
 			INFO("Overrun end of disk in cache readahead\n");
 			break;
 		}
-		page = radix_tree_lookup(&mapping->page_tree, pagei);
+		page = radix_tree_lookup_extent(&mapping->page_tree, pagei);
 		if (page && (!i))
 			break;
 		if (page)
diff -r 511f067627ac -r 4cac7e560b53 include/linux/fs.h
--- a/include/linux/fs.h	Thu Dec 21 00:20:01 2006 -0800
+++ b/include/linux/fs.h	Thu Dec 21 15:31:30 2006 -0500
@@ -489,6 +489,11 @@ struct block_device {
  */
 #define PAGECACHE_TAG_DIRTY	0
 #define PAGECACHE_TAG_WRITEBACK	1
+
+/*
+ * This tag is only valid on the root of the radix tree
+ */
+#define PAGE_CACHE_TAG_EXTENTS 2
 
 int mapping_tagged(struct address_space *mapping, int tag);
 
diff -r 511f067627ac -r 4cac7e560b53 include/linux/page-flags.h
--- a/include/linux/page-flags.h	Thu Dec 21 00:20:01 2006 -0800
+++ b/include/linux/page-flags.h	Thu Dec 21 15:31:30 2006 -0500
@@ -267,4 +267,6 @@ static inline void set_page_writeback(st
 	test_set_page_writeback(page);
 }
 
+void set_page_placeholder(struct page *page, pgoff_t start, pgoff_t end);
+
 #endif	/* PAGE_FLAGS_H */
diff -r 511f067627ac -r 4cac7e560b53 include/linux/pagemap.h
--- a/include/linux/pagemap.h	Thu Dec 21 00:20:01 2006 -0800
+++ b/include/linux/pagemap.h	Thu Dec 21 15:31:30 2006 -0500
@@ -76,6 +76,11 @@ extern struct page * find_get_page(struc
 				unsigned long index);
 extern struct page * find_lock_page(struct address_space *mapping,
 				unsigned long index);
+int find_or_insert_placeholders(struct address_space *mapping,
+                                  struct page **pages, unsigned long start,
+                                  unsigned long end, unsigned long nr,
+                                  gfp_t gfp_mask,
+                                  int wait);
 extern __deprecated_for_modules struct page * find_trylock_page(
 			struct address_space *mapping, unsigned long index);
 extern struct page * find_or_create_page(struct address_space *mapping,
@@ -86,6 +91,15 @@ unsigned find_get_pages_contig(struct ad
 			       unsigned int nr_pages, struct page **pages);
 unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
 			int tag, unsigned int nr_pages, struct page **pages);
+void remove_placeholder_pages(struct address_space *mapping,
+                             struct page **pages,
+                             unsigned long offset,
+                             unsigned long end,
+                             unsigned long nr);
+void wake_up_placeholder_page(struct page *page);
+void wait_on_placeholder_pages_range(struct address_space *mapping, pgoff_t start,
+			       pgoff_t end);
+
 
 /*
  * Returns locked page at given index in given cache, creating it if needed.
@@ -116,6 +130,8 @@ int add_to_page_cache_lru(struct page *p
 				unsigned long index, gfp_t gfp_mask);
 extern void remove_from_page_cache(struct page *page);
 extern void __remove_from_page_cache(struct page *page);
+struct page *radix_tree_lookup_extent(struct radix_tree_root *root,
+					     unsigned long index);
 
 /*
  * Return byte-offset into filesystem object for page.
diff -r 511f067627ac -r 4cac7e560b53 include/linux/radix-tree.h
--- a/include/linux/radix-tree.h	Thu Dec 21 00:20:01 2006 -0800
+++ b/include/linux/radix-tree.h	Thu Dec 21 15:31:30 2006 -0500
@@ -53,6 +53,7 @@ static inline int radix_tree_is_direct_p
 /*** radix-tree API starts here ***/
 
 #define RADIX_TREE_MAX_TAGS 2
+#define RADIX_TREE_MAX_ROOT_TAGS 3
 
 /* root tags are stored in gfp_mask, shifted by __GFP_BITS_SHIFT */
 struct radix_tree_root {
@@ -168,6 +169,7 @@ radix_tree_gang_lookup_tag(struct radix_
 		unsigned long first_index, unsigned int max_items,
 		unsigned int tag);
 int radix_tree_tagged(struct radix_tree_root *root, unsigned int tag);
+void radix_tree_root_tag_set(struct radix_tree_root *root, unsigned int tag);
 
 static inline void radix_tree_preload_end(void)
 {
diff -r 511f067627ac -r 4cac7e560b53 lib/radix-tree.c
--- a/lib/radix-tree.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/lib/radix-tree.c	Thu Dec 21 15:31:30 2006 -0500
@@ -468,6 +468,12 @@ void *radix_tree_tag_set(struct radix_tr
 	return slot;
 }
 EXPORT_SYMBOL(radix_tree_tag_set);
+
+void radix_tree_root_tag_set(struct radix_tree_root *root, unsigned int tag)
+{
+	root_tag_set(root, tag);
+}
+EXPORT_SYMBOL(radix_tree_root_tag_set);
 
 /**
  *	radix_tree_tag_clear - clear a tag on a radix tree node
diff -r 511f067627ac -r 4cac7e560b53 mm/filemap.c
--- a/mm/filemap.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/mm/filemap.c	Thu Dec 21 15:31:30 2006 -0500
@@ -44,6 +44,14 @@ generic_file_direct_IO(int rw, struct ki
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	loff_t offset, unsigned long nr_segs);
 
+static wait_queue_head_t *page_waitqueue(struct page *page);
+static void wait_on_placeholder_page(struct address_space *mapping,
+				     struct page *page,
+				     int write_lock);
+
+static struct address_space placeholder_address_space;
+#define PagePlaceHolder(page) ((page)->mapping == &placeholder_address_space)
+
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
  * though.
@@ -421,6 +429,41 @@ int filemap_write_and_wait_range(struct 
 	return err;
 }
 
+/*
+ * When the radix tree has the extent bit set, a lookup needs to search
+ * forward in the tree to find any extent the index might intersect.
+ * When extents are off, a faster radix_tree_lookup can be done instead.
+ *
+ * This does the appropriate lookup based on the PAGE_CACHE_TAG_EXTENTS
+ * bit on the root node
+ */
+struct page *radix_tree_lookup_extent(struct radix_tree_root *root,
+					     unsigned long index)
+{
+	if (radix_tree_tagged(root, PAGE_CACHE_TAG_EXTENTS)) {
+		struct page *p;
+		unsigned int found;
+		found = radix_tree_gang_lookup(root, (void **)(&p), index, 1);
+		if (found) {
+			if (PagePlaceHolder(p)) {
+				pgoff_t start = p->flags;
+				pgoff_t end = p->index;
+				if (end >= index && start <= index)
+					return p;
+				return NULL;
+			} else {
+				if (p->index == index) {
+					return p;
+				}
+				return NULL;
+			}
+		} else
+			return NULL;
+	}
+	return radix_tree_lookup(root, index);
+}
+
+
 /**
  * add_to_page_cache - add newly allocated pagecache pages
  * @page:	page to add
@@ -437,12 +480,38 @@ int add_to_page_cache(struct page *page,
 int add_to_page_cache(struct page *page, struct address_space *mapping,
 		pgoff_t offset, gfp_t gfp_mask)
 {
-	int error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
+	int error;
+again:
+	error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
 
 	if (error == 0) {
+		struct page *tmp;
 		write_lock_irq(&mapping->tree_lock);
+		/*
+		 * If extents are on for this radix tree, we have to do
+		 * the more expensive search for an overlapping extent
+		 * before we try to insert.
+		 */
+		if (radix_tree_tagged(&mapping->page_tree,
+				      PAGE_CACHE_TAG_EXTENTS)) {
+			tmp = radix_tree_lookup_extent(&mapping->page_tree,
+						       offset);
+			if (tmp && PagePlaceHolder(tmp))
+				goto exists;
+		}
 		error = radix_tree_insert(&mapping->page_tree, offset, page);
-		if (!error) {
+		if (error == -EEXIST && (gfp_mask & __GFP_WAIT)) {
+			tmp = radix_tree_lookup_extent(&mapping->page_tree,
+						       offset);
+			if (tmp && PagePlaceHolder(tmp)) {
+exists:
+				radix_tree_preload_end();
+				/* drops the lock */
+				wait_on_placeholder_page(mapping, tmp, 1);
+				goto again;
+			}
+		}
+		if (!error && !PagePlaceHolder(page)) {
 			page_cache_get(page);
 			SetPageLocked(page);
 			page->mapping = mapping;
@@ -516,6 +585,92 @@ void fastcall wait_on_page_bit(struct pa
 }
 EXPORT_SYMBOL(wait_on_page_bit);
 
+/*
+ * Call with either a read lock or a write lock on the mapping tree.
+ *
+ * placeholder pages can't be tested or checked without the tree lock held
+ *
+ * In order to wait for the placeholders without losing a wakeup from someone
+ * removing them, we have to prepare_to_wait before dropping the tree lock.
+ *
+ * The lock is dropped just before waiting for the place holder.  It is not
+ * retaken before returning.
+ */
+static void wait_on_placeholder_page(struct address_space *mapping,
+				     struct page *page,
+				     int write_lock)
+{
+	DEFINE_WAIT(wait);
+	wait_queue_head_t *wqh = page_waitqueue(page);
+	prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
+	if (write_lock)
+		write_unlock_irq(&mapping->tree_lock);
+	else
+		read_unlock_irq(&mapping->tree_lock);
+	io_schedule();
+	finish_wait(wqh, &wait);
+}
+
+void wake_up_placeholder_page(struct page *page)
+{
+	__wake_up_bit(page_waitqueue(page), &page->flags, PG_locked);
+}
+EXPORT_SYMBOL_GPL(wake_up_placeholder_page);
+
+/**
+ * wait_on_placeholder_pages - gang placeholder page waiter
+ * @mapping:	The address_space to search
+ * @start:	The starting page index
+ * @end:	The max page index (inclusive)
+ *
+ * wait_on_placeholder_pages() will search for and wait on a range of pages
+ * in the mapping
+ *
+ * On return, the range has no placeholder pages sitting in it.
+ */
+void wait_on_placeholder_pages_range(struct address_space *mapping,
+			       pgoff_t start, pgoff_t end)
+{
+	unsigned int i;
+	unsigned int ret;
+	struct page *pages[8];
+	pgoff_t cur = start;
+	pgoff_t highest = start;
+
+	/*
+	 * we expect a very small number of place holder pages, so
+	 * this code isn't trying to be very fast.
+	 */
+again:
+	read_lock_irq(&mapping->tree_lock);
+	ret = radix_tree_gang_lookup(&mapping->page_tree,
+				(void **)pages, cur, ARRAY_SIZE(pages));
+	for (i = 0; i < ret; i++) {
+		if (PagePlaceHolder(pages[i])) {
+			if (pages[i]->flags > end)
+				goto done;
+			/* drops the lock */
+			wait_on_placeholder_page(mapping, pages[i], 0);
+			goto again;
+		}
+		if (pages[i]->index > highest)
+			highest = pages[i]->index;
+		if (pages[i]->index > end)
+			goto done;
+	}
+	if (highest < end && ret == ARRAY_SIZE(pages)) {
+		cur = highest;
+		if (need_resched()) {
+			read_unlock_irq(&mapping->tree_lock);
+			cond_resched();
+		}
+		goto again;
+	}
+done:
+	read_unlock_irq(&mapping->tree_lock);
+}
+EXPORT_SYMBOL_GPL(wait_on_placeholder_pages_range);
+
 /**
  * unlock_page - unlock a locked page
  * @page: the page
@@ -532,6 +687,7 @@ EXPORT_SYMBOL(wait_on_page_bit);
  */
 void fastcall unlock_page(struct page *page)
 {
+	BUG_ON(PagePlaceHolder(page));
 	smp_mb__before_clear_bit();
 	if (!TestClearPageLocked(page))
 		BUG();
@@ -568,6 +724,7 @@ void fastcall __lock_page(struct page *p
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	BUG_ON(PagePlaceHolder(page));
 	__wait_on_bit_lock(page_waitqueue(page), &wait, sync_page,
 							TASK_UNINTERRUPTIBLE);
 }
@@ -580,6 +737,7 @@ void fastcall __lock_page_nosync(struct 
 void fastcall __lock_page_nosync(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
+	BUG_ON(PagePlaceHolder(page));
 	__wait_on_bit_lock(page_waitqueue(page), &wait, __sleep_on_page_lock,
 							TASK_UNINTERRUPTIBLE);
 }
@@ -597,13 +755,281 @@ struct page * find_get_page(struct addre
 	struct page *page;
 
 	read_lock_irq(&mapping->tree_lock);
-	page = radix_tree_lookup(&mapping->page_tree, offset);
-	if (page)
-		page_cache_get(page);
+	page = radix_tree_lookup_extent(&mapping->page_tree, offset);
+	if (page) {
+		if (PagePlaceHolder(page))
+			page = NULL;
+		else
+			page_cache_get(page);
+	}
 	read_unlock_irq(&mapping->tree_lock);
 	return page;
 }
 EXPORT_SYMBOL(find_get_page);
+
+/**
+ * remove_placeholder_pages - remove a range of placeholder or locked pages
+ * @mapping: the page's address_space
+ * @pages: an array of page pointers to use for gang looukps
+ * @placeholder: the placeholder page previously inserted (for verification)
+ * @start: the search starting point
+ * @end: the search end point (offsets >= end are not touched)
+ * @nr: the size of the pages array.
+ *
+ * Any placeholder pages in the range specified are removed.  Any real
+ * pages are unlocked and released.
+ */
+void remove_placeholder_pages(struct address_space *mapping,
+			     struct page **pages,
+			     unsigned long start,
+			     unsigned long end,
+			     unsigned long nr)
+{
+	struct page *page;
+	int ret;
+	int i;
+	unsigned long num;
+
+	write_lock_irq(&mapping->tree_lock);
+	while (start < end) {
+		num = min(nr, end-start);
+		ret = radix_tree_gang_lookup(&mapping->page_tree,
+						(void **)pages, start, num);
+		for (i = 0; i < ret; i++) {
+			page = pages[i];
+			if (PagePlaceHolder(page)) {
+				if (page->index >= end)
+					break;
+				radix_tree_delete(&mapping->page_tree,
+						  page->index);
+				start = page->index + 1;
+				wake_up_placeholder_page(page);
+				kfree(page);
+			} else {
+				if (page->index >= end)
+					break;
+				unlock_page(page);
+				page_cache_release(page);
+				start = page->index + 1;
+			}
+		}
+		if (need_resched()) {
+			write_unlock_irq(&mapping->tree_lock);
+			cond_resched();
+			write_lock_irq(&mapping->tree_lock);
+		}
+	}
+	write_unlock_irq(&mapping->tree_lock);
+}
+EXPORT_SYMBOL_GPL(remove_placeholder_pages);
+
+/*
+ * a helper function to insert a placeholder into multiple slots
+ * in the radix tree.  This could probably use an optimized version
+ * in the radix code.  It may insert fewer than the request number
+ * of placeholders if we need to reschedule or the radix tree needs to
+ * be preloaded again.
+ *
+ * returns zero on error or the number actually inserted.
+ */
+static int insert_placeholder(struct address_space *mapping,
+					 struct page *insert)
+{
+	int err;
+	unsigned int found;
+	struct page *debug_page;
+	/* sanity check, make sure other extents don't exist in this range */
+	found = radix_tree_gang_lookup(&mapping->page_tree,
+				    (void **)(&debug_page),
+				    insert->flags, 1);
+	BUG_ON(found > 0 && debug_page->flags <= (insert->index));
+	err = radix_tree_insert(&mapping->page_tree, insert->index, insert);
+	return err;
+}
+
+
+static struct page *alloc_placeholder(gfp_t gfp_mask)
+{
+	struct page *p;
+	p = kmalloc(sizeof(*p), gfp_mask);
+	if (p) {
+		memset(p, 0, sizeof(*p));
+		p->mapping = &placeholder_address_space;
+	}
+	return p;
+}
+
+/**
+ * find_or_insert_placeholders - locate a group of pagecache pages or insert one
+ * @mapping: the page's address_space
+ * @pages: an array of page pointers to use for gang looukps
+ * @start: the search starting point
+ * @end: the search end point (offsets >= end are not touched)
+ * @nr: the size of the pages array.
+ * @gfp_mask: page allocation mode
+ * @insert: the page to insert if none is found
+ * @iowait: 1 if you want to wait for dirty or writeback pages.
+ *
+ * This locks down a range of offsets in the address space.  Any pages
+ * already present are locked and a placeholder page is inserted into
+ * the radix tree for any offsets without pages.
+ */
+int find_or_insert_placeholders(struct address_space *mapping,
+				  struct page **pages, unsigned long start,
+				  unsigned long end, unsigned long nr,
+				  gfp_t gfp_mask,
+				  int iowait)
+{
+	int err = 0;
+	int i, ret;
+	unsigned long cur = start;
+	struct page *page;
+	int restart;
+	struct page *insert = NULL;
+	/*
+	 * this gets complicated.  Placeholders and page locks need to
+	 * be taken in order.  We use gang lookup to cut down on the cpu
+	 * cost, but we need to keep track of holes in the results and
+	 * insert placeholders as appropriate.
+	 *
+	 * If a locked page or a placeholder is found, we wait for it and
+	 * pick up where we left off.  If a dirty or PG_writeback page is found
+	 * and iowait==1, we have to drop all of our locks, kick/wait for the
+	 * io and resume again.
+	 */
+repeat:
+	if (!insert) {
+		insert = alloc_placeholder(gfp_mask);
+		if (!insert) {
+			err = -ENOMEM;
+			goto fail;
+		}
+	}
+	if (cur != start )
+		cond_resched();
+	err = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
+	if (err)
+		goto fail;
+	write_lock_irq(&mapping->tree_lock);
+
+	/* only set the extent tag if we are inserting placeholders for more
+	 * than one page worth of slots.  This way small random ios don't
+	 * suffer from slower lookups.
+	 */
+	if (cur == start && end - start > 1)
+		radix_tree_root_tag_set(&mapping->page_tree,
+					PAGE_CACHE_TAG_EXTENTS);
+repeat_lock:
+	ret = radix_tree_gang_lookup(&mapping->page_tree,
+					(void **)pages, cur,
+					min(nr, end-cur));
+	for (i = 0 ; i < ret ; i++) {
+		restart = 0;
+		page = pages[i];
+
+		if (PagePlaceHolder(page) && page->flags < end) {
+			radix_tree_preload_end();
+			/* drops the lock */
+			wait_on_placeholder_page(mapping, page, 1);
+			goto repeat;
+		}
+
+		if (page->index > cur) {
+			unsigned long top = min(end, page->index);
+			insert->index = top - 1;
+			insert->flags = cur;
+			err = insert_placeholder(mapping, insert);
+			write_unlock_irq(&mapping->tree_lock);
+			radix_tree_preload_end();
+			insert = NULL;
+			if (err)
+				goto fail;
+			cur = top;
+			if (cur < end)
+				goto repeat;
+			else
+				goto done;
+		}
+		if (page->index >= end) {
+			ret = 0;
+			break;
+		}
+		page_cache_get(page);
+		BUG_ON(page->index != cur);
+		BUG_ON(PagePlaceHolder(page));
+		if (TestSetPageLocked(page)) {
+			unsigned long tmpoff = page->index;
+			page_cache_get(page);
+			write_unlock_irq(&mapping->tree_lock);
+			radix_tree_preload_end();
+			__lock_page(page);
+			/* Has the page been truncated while we slept? */
+			if (unlikely(page->mapping != mapping ||
+				     page->index != tmpoff)) {
+				unlock_page(page);
+				page_cache_release(page);
+				goto repeat;
+			} else {
+				/* we've locked the page, but  we need to
+				 *  check it for dirty/writeback
+				 */
+				restart = 1;
+			}
+		}
+		if (iowait && (PageDirty(page) || PageWriteback(page))) {
+			unlock_page(page);
+			page_cache_release(page);
+			if (!restart) {
+				write_unlock_irq(&mapping->tree_lock);
+				radix_tree_preload_end();
+			}
+			err = filemap_write_and_wait_range(mapping,
+						 cur << PAGE_CACHE_SHIFT,
+						 end << PAGE_CACHE_SHIFT);
+			if (err)
+				goto fail;
+			goto repeat;
+		}
+		cur++;
+		if (restart)
+			goto repeat;
+		if (cur >= end)
+			break;
+	}
+
+	/* we haven't yet filled the range */
+	if (cur < end) {
+		/* if the search filled our array, there is more to do. */
+		if (ret && ret == nr)
+			goto repeat_lock;
+
+		/* otherwise insert placeholders for the remaining offsets */
+		insert->index = end - 1;
+		insert->flags = cur;
+		err = insert_placeholder(mapping, insert);
+		write_unlock_irq(&mapping->tree_lock);
+		radix_tree_preload_end();
+		if (err)
+			goto fail;
+		insert = NULL;
+		cur = end;
+	} else {
+		write_unlock_irq(&mapping->tree_lock);
+		radix_tree_preload_end();
+	}
+done:
+	BUG_ON(cur < end);
+	BUG_ON(cur > end);
+	if (insert)
+		kfree(insert);
+	return err;
+fail:
+	remove_placeholder_pages(mapping, pages, start, cur, nr);
+	if (insert)
+		kfree(insert);
+	return err;
+}
+EXPORT_SYMBOL_GPL(find_or_insert_placeholders);
 
 /**
  * find_trylock_page - find and lock a page
@@ -617,8 +1043,8 @@ struct page *find_trylock_page(struct ad
 	struct page *page;
 
 	read_lock_irq(&mapping->tree_lock);
-	page = radix_tree_lookup(&mapping->page_tree, offset);
-	if (page && TestSetPageLocked(page))
+	page = radix_tree_lookup_extent(&mapping->page_tree, offset);
+	if (page && (PagePlaceHolder(page) || TestSetPageLocked(page)))
 		page = NULL;
 	read_unlock_irq(&mapping->tree_lock);
 	return page;
@@ -642,8 +1068,14 @@ struct page *find_lock_page(struct addre
 
 	read_lock_irq(&mapping->tree_lock);
 repeat:
-	page = radix_tree_lookup(&mapping->page_tree, offset);
+	page = radix_tree_lookup_extent(&mapping->page_tree, offset);
 	if (page) {
+		if (PagePlaceHolder(page)) {
+			/* drops the lock */
+			wait_on_placeholder_page(mapping, page, 0);
+			read_lock_irq(&mapping->tree_lock);
+			goto repeat;
+		}
 		page_cache_get(page);
 		if (TestSetPageLocked(page)) {
 			read_unlock_irq(&mapping->tree_lock);
@@ -727,14 +1159,25 @@ unsigned find_get_pages(struct address_s
 unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
 			    unsigned int nr_pages, struct page **pages)
 {
-	unsigned int i;
+	unsigned int i = 0;
 	unsigned int ret;
 
 	read_lock_irq(&mapping->tree_lock);
 	ret = radix_tree_gang_lookup(&mapping->page_tree,
 				(void **)pages, start, nr_pages);
-	for (i = 0; i < ret; i++)
-		page_cache_get(pages[i]);
+	while(i < ret) {
+		if (PagePlaceHolder(pages[i])) {
+			/* we can't return a place holder, shift it away */
+			if (i + 1 < ret) {
+				memcpy(&pages[i], &pages[i+1],
+		                       (ret - i - 1) * sizeof(struct page *));
+			}
+			ret--;
+			continue;
+		} else
+			page_cache_get(pages[i]);
+		i++;
+	}
 	read_unlock_irq(&mapping->tree_lock);
 	return ret;
 }
@@ -761,6 +1204,8 @@ unsigned find_get_pages_contig(struct ad
 	ret = radix_tree_gang_lookup(&mapping->page_tree,
 				(void **)pages, index, nr_pages);
 	for (i = 0; i < ret; i++) {
+		if (PagePlaceHolder(pages[i]))
+			break;
 		if (pages[i]->mapping == NULL || pages[i]->index != index)
 			break;
 
@@ -785,14 +1230,25 @@ unsigned find_get_pages_tag(struct addre
 unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
 			int tag, unsigned int nr_pages, struct page **pages)
 {
-	unsigned int i;
+	unsigned int i = 0;
 	unsigned int ret;
 
 	read_lock_irq(&mapping->tree_lock);
 	ret = radix_tree_gang_lookup_tag(&mapping->page_tree,
 				(void **)pages, *index, nr_pages, tag);
-	for (i = 0; i < ret; i++)
-		page_cache_get(pages[i]);
+	while(i < ret) {
+		if (PagePlaceHolder(pages[i])) {
+			/* we can't return a place holder, shift it away */
+			if (i + 1 < ret) {
+				memcpy(&pages[i], &pages[i+1],
+		                       (ret - i - 1) * sizeof(struct page *));
+			}
+			ret--;
+			continue;
+		} else
+			page_cache_get(pages[i]);
+		i++;
+	}
 	if (ret)
 		*index = pages[ret - 1]->index + 1;
 	read_unlock_irq(&mapping->tree_lock);
@@ -2406,18 +2862,15 @@ generic_file_direct_IO(int rw, struct ki
 			unmap_mapping_range(mapping, offset, write_len, 0);
 	}
 
-	retval = filemap_write_and_wait(mapping);
-	if (retval == 0) {
-		retval = mapping->a_ops->direct_IO(rw, iocb, iov,
-						offset, nr_segs);
-		if (rw == WRITE && mapping->nrpages) {
-			pgoff_t end = (offset + write_len - 1)
-						>> PAGE_CACHE_SHIFT;
-			int err = invalidate_inode_pages2_range(mapping,
-					offset >> PAGE_CACHE_SHIFT, end);
-			if (err)
-				retval = err;
-		}
+	retval = mapping->a_ops->direct_IO(rw, iocb, iov,
+					offset, nr_segs);
+	if (rw == WRITE && mapping->nrpages) {
+		pgoff_t end = (offset + write_len - 1)
+					>> PAGE_CACHE_SHIFT;
+		int err = invalidate_inode_pages2_range(mapping,
+				offset >> PAGE_CACHE_SHIFT, end);
+		if (err)
+			retval = err;
 	}
 	return retval;
 }
diff -r 511f067627ac -r 4cac7e560b53 mm/migrate.c
--- a/mm/migrate.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/mm/migrate.c	Thu Dec 21 15:31:30 2006 -0500
@@ -305,8 +305,12 @@ static int migrate_page_move_mapping(str
 
 	write_lock_irq(&mapping->tree_lock);
 
+	/*
+	 * we don't need to worry about placeholders here,
+	 * the slot in the tree is verified
+	 */
 	pslot = radix_tree_lookup_slot(&mapping->page_tree,
- 					page_index(page));
+					page_index(page));
 
 	if (page_count(page) != 2 + !!PagePrivate(page) ||
 			(struct page *)radix_tree_deref_slot(pslot) != page) {
diff -r 511f067627ac -r 4cac7e560b53 mm/readahead.c
--- a/mm/readahead.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/mm/readahead.c	Thu Dec 21 15:31:30 2006 -0500
@@ -288,7 +288,8 @@ __do_page_cache_readahead(struct address
 		if (page_offset > end_index)
 			break;
 
-		page = radix_tree_lookup(&mapping->page_tree, page_offset);
+		page = radix_tree_lookup_extent(&mapping->page_tree,
+						page_offset);
 		if (page)
 			continue;
 
diff -r 511f067627ac -r 4cac7e560b53 mm/truncate.c
--- a/mm/truncate.c	Thu Dec 21 00:20:01 2006 -0800
+++ b/mm/truncate.c	Thu Dec 21 15:31:30 2006 -0500
@@ -209,6 +209,7 @@ void truncate_inode_pages_range(struct a
 		}
 		pagevec_release(&pvec);
 	}
+	wait_on_placeholder_pages_range(mapping, start, end);
 }
 EXPORT_SYMBOL(truncate_inode_pages_range);
 



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 20:56 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 20:56 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:34:57 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 0 of 8] O_DIRECT locking rework v3
Message-Id: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:34:56 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

I took a small detour on the O_DIRECT locking rework to look at
different alternatives for range locking in the pagecache.  After
benchmarking a few different types of trees, I didn't find anything that
would match radix for random lookup performance.

This patchset does ranges in the radix tree by inserting a placeholder
at the last slot in the range and forcing all lookups to search forward.
It means radix_tree_gang_lookup must be used instead of
radix_tree_lookup, but this is still faster for random searches than
anything else I tried.

A bit is set on the radix root node to indicate if range searching is
required.  So, when O_DIRECT isn't used or O_DIRECT is used for tiny
ios, no range lookups are done.

With O_DIRECT in use, only a single placeholder is inserted
to lock down the entire range for a given IO.  This should
stand up pretty well for those monster XFS workloads.

If the mapping has pages on it, I do one placeholder for every 64k
to limit the number of pages pinned down during the IO.  There's
lots of hand waving here, it may be best to get rid of this special
case.

Patch against Linus' git from today.  Testing has been light, I'm
mostly looking for comments on the range locking tricks.

-chris



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 20:56 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 20:56 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:02 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 5 of 8] Make ext3 safe for the new DIO locking rules
X-Mercurial-Node: bebaf8972a3198faf661ab988af0f53cd49856bb
Message-Id: <bebaf8972a3198faf661.1166733301@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:35:01 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

This creates a version of ext3_get_block that starts and ends a transaction.

By starting and ending the transaction inside get_block, this is able to
avoid lock inversion problems when the DIO code tries to take page locks
inside blockdev_direct_IO. (transaction locks must always happen after
page locks).

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 385bc75d9266 -r bebaf8972a31 fs/ext3/inode.c
--- a/fs/ext3/inode.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/fs/ext3/inode.c	Thu Dec 21 15:31:30 2006 -0500
@@ -1673,6 +1673,30 @@ static int ext3_releasepage(struct page 
 	return journal_try_to_free_buffers(journal, page, wait);
 }
 
+static int ext3_get_block_direct_IO(struct inode *inode, sector_t iblock,
+			struct buffer_head *bh_result, int create)
+{
+	int ret = 0;
+	handle_t *handle = ext3_journal_start(inode, DIO_CREDITS);
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		goto out;
+	}
+	ret = ext3_get_block(inode, iblock, bh_result, create);
+	/*
+	 * Reacquire the handle: ext3_get_block() can restart the transaction
+	 */
+	handle = journal_current_handle();
+	if (handle) {
+		int err;
+		err = ext3_journal_stop(handle);
+		if (!ret)
+			ret = err;
+	}
+out:
+	return ret;
+}
+
 /*
  * If the O_DIRECT write will extend the file then add this inode to the
  * orphan list.  So recovery will truncate it back to the original size
@@ -1693,39 +1717,58 @@ static ssize_t ext3_direct_IO(int rw, st
 	int orphan = 0;
 	size_t count = iov_length(iov, nr_segs);
 
-	if (rw == WRITE) {
-		loff_t final_size = offset + count;
-
+	if (rw == WRITE && (offset + count > inode->i_size)) { 
 		handle = ext3_journal_start(inode, DIO_CREDITS);
 		if (IS_ERR(handle)) {
 			ret = PTR_ERR(handle);
 			goto out;
 		}
-		if (final_size > inode->i_size) {
-			ret = ext3_orphan_add(handle, inode);
-			if (ret)
-				goto out_stop;
-			orphan = 1;
-			ei->i_disksize = inode->i_size;
-		}
-	}
-
+		ret = ext3_orphan_add(handle, inode);
+		if (ret) {
+			ext3_journal_stop(handle);
+			goto out;
+		}
+		ei->i_disksize = inode->i_size;
+		ret = ext3_journal_stop(handle);
+		if (ret) {
+			/* something has gone horribly wrong, cleanup
+			 * the orphan list in ram
+			 */
+			if (inode->i_nlink)
+				ext3_orphan_del(NULL, inode);
+			goto out;
+		}
+		orphan = 1;
+	}
+
+	/*
+	 * the placeholder page code may take a page lock, so we have
+	 * to stop any running transactions before calling
+	 * blockdev_direct_IO.  Use ext3_get_block_direct_IO to start
+	 * and stop a transaction on each get_block call.
+	 */
 	ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
 				 offset, nr_segs,
-				 ext3_get_block, NULL);
+				 ext3_get_block_direct_IO, NULL);
 
 	/*
 	 * Reacquire the handle: ext3_get_block() can restart the transaction
 	 */
 	handle = journal_current_handle();
 
-out_stop:
-	if (handle) {
+	if (orphan) {
 		int err;
-
-		if (orphan && inode->i_nlink)
+		handle = ext3_journal_start(inode, DIO_CREDITS);
+		if (IS_ERR(handle)) {
+			ret = PTR_ERR(handle);
+			if (inode->i_nlink)
+				ext3_orphan_del(NULL, inode);
+			goto out;
+		}
+
+		if (inode->i_nlink)
 			ext3_orphan_del(handle, inode);
-		if (orphan && ret > 0) {
+		if (ret > 0) {
 			loff_t end = offset + ret;
 			if (end > inode->i_size) {
 				ei->i_disksize = end;



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 20:56 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 20:56 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:01 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 4 of 8] Add flags to control direct IO helpers
X-Mercurial-Node: 385bc75d9266569cff5f0f5fce546cfff4d6fb01
Message-Id: <385bc75d9266569cff5f.1166733300@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:35:00 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

This creates a number of flags so that filesystems can control
blockdev_direct_IO.  It is based on code from Russell Cettelan.

The new flags are:
DIO_CREATE -- always pass create=1 to get_block on writes.  This allows
	      DIO to fill holes in the file.
DIO_PLACEHOLDERS -- use placeholder pages to provide locking against buffered
	            io and truncates.
DIO_DROP_I_MUTEX -- drop i_mutex before starting the mapping, io submission,
		    or io waiting.  The mutex is still dropped for AIO
		    as well.

Some API changes are made so that filesystems can have more control
over the DIO features.

__blockdev_direct_IO is more or less renamed to blockdev_direct_IO_flags.
All waiting and invalidating of page cache data is pushed down into
blockdev_direct_IO_flags (and removed from mm/filemap.c)

direct_io_worker is exported into the wild.  Filesystems that want to be
special can pull out the bits of blockdev_direct_IO_flags they care about
and then call direct_io_worker directly.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r ac51e7a4c7a6 -r 385bc75d9266 fs/direct-io.c
--- a/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
@@ -54,13 +54,6 @@
  *
  * If blkfactor is zero then the user's request was aligned to the filesystem's
  * blocksize.
- *
- * lock_type is DIO_LOCKING for regular files on direct-IO-naive filesystems.
- * This determines whether we need to do the fancy locking which prevents
- * direct-IO from being able to read uninitialised disk blocks.  If its zero
- * (blockdev) this locking is not done, and if it is DIO_OWN_LOCKING i_mutex is
- * not held for the entire direct write (taken briefly, initially, during a
- * direct read though, but its never held for the duration of a direct-IO).
  */
 
 struct dio {
@@ -69,8 +62,7 @@ struct dio {
 	struct inode *inode;
 	int rw;
 	loff_t i_size;			/* i_size when submitted */
-	int lock_type;			/* doesn't change */
-	int reacquire_i_mutex;		/* should we get i_mutex when done? */
+	unsigned flags;			/* doesn't change */
 	unsigned blkbits;		/* doesn't change */
 	unsigned blkfactor;		/* When we're using an alignment which
 					   is finer than the filesystem's soft
@@ -202,7 +194,7 @@ static void unlock_page_range(struct dio
 static void unlock_page_range(struct dio *dio, unsigned long start,
 			      unsigned long nr)
 {
-	if (dio->lock_type != DIO_NO_LOCKING) {
+	if (dio->flags & DIO_PLACEHOLDERS) {
 		remove_placeholder_pages(dio->inode->i_mapping, dio->tmppages,
 					 start, start + nr,
 					 ARRAY_SIZE(dio->tmppages));
@@ -215,13 +207,14 @@ static int lock_page_range(struct dio *d
 	struct address_space *mapping = dio->inode->i_mapping;
 	unsigned long end = start + nr;
 
-	if (dio->lock_type == DIO_NO_LOCKING)
-		return 0;
-	return find_or_insert_placeholders(mapping, dio->tmppages, start, end,
-	                                  ARRAY_SIZE(dio->tmppages),
-					  GFP_KERNEL, 1);
-}
-
+	if (dio->flags & DIO_PLACEHOLDERS) {
+		return find_or_insert_placeholders(mapping, dio->tmppages,
+						   start, end,
+						   ARRAY_SIZE(dio->tmppages),
+						   GFP_KERNEL, 1);
+	}
+	return 0;
+}
 
 /*
  * Get another userspace page.  Returns an ERR_PTR on error.  Pages are
@@ -282,8 +275,6 @@ static int dio_complete(struct dio *dio,
 	unlock_page_range(dio, dio->fspages_start_off,
 			  dio->fspages_end_off - dio->fspages_start_off);
 	dio->fspages_end_off = dio->fspages_start_off;
-	if (dio->reacquire_i_mutex)
-		mutex_lock(&dio->inode->i_mutex);
 
 	if (ret == 0)
 		ret = dio->page_errors;
@@ -569,8 +560,9 @@ static int get_more_blocks(struct dio *d
 		map_bh->b_state = 0;
 		map_bh->b_size = fs_count << dio->inode->i_blkbits;
 
-		create = dio->rw & WRITE;
-		if (dio->lock_type == DIO_NO_LOCKING)
+		if (dio->flags & DIO_CREATE)
+			create = dio->rw & WRITE;
+		else
 			create = 0;
 	        index = fs_startblk >> (PAGE_CACHE_SHIFT -
 		                        dio->inode->i_blkbits);
@@ -996,19 +988,43 @@ out:
 	return ret;
 }
 
-static ssize_t
-direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, 
-	const struct iovec *iov, loff_t offset, unsigned long nr_segs, 
+/*
+ * This does all the real work of the direct io.  Most filesystems want to
+ * call blockdev_direct_IO_flags instead, but if you have exotic locking
+ * routines you can call this directly.
+ *
+ * The flags parameter is a bitmask of:
+ *
+ * DIO_PLACEHOLDERS (use placeholder pages for locking)
+ * DIO_CREATE (pass create=1 to get_block for filling holes or extending)
+ * DIO_DROP_I_MUTEX (drop inode->i_mutex during writes)
+ */
+ssize_t
+direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode,
+	const struct iovec *iov, loff_t offset, unsigned long nr_segs,
 	unsigned blkbits, get_block_t get_block, dio_iodone_t end_io,
-	struct dio *dio)
-{
-	unsigned long user_addr; 
+	int is_async, unsigned dioflags)
+{
+	unsigned long user_addr;
 	unsigned long flags;
 	int seg;
 	ssize_t ret = 0;
 	ssize_t ret2;
 	size_t bytes;
-
+	struct dio *dio;
+
+	if (rw & WRITE)
+		rw = WRITE_SYNC;
+
+	dio = kmalloc(sizeof(*dio), GFP_KERNEL);
+	ret = -ENOMEM;
+	if (!dio)
+		goto out;
+
+	dio->fspages_start_off = offset >> PAGE_CACHE_SHIFT;
+	dio->fspages_end_off = dio->fspages_start_off;
+	dio->flags = dioflags;
+	dio->is_async = is_async;
 	dio->bio = NULL;
 	dio->inode = inode;
 	dio->rw = rw;
@@ -1156,33 +1172,24 @@ direct_io_worker(int rw, struct kiocb *i
 	} else
 		BUG_ON(ret != -EIOCBQUEUED);
 
+out:
 	return ret;
 }
-
-/*
- * This is a library function for use by filesystem drivers.
- * The locking rules are governed by the dio_lock_type parameter.
- *
- * DIO_NO_LOCKING (no locking, for raw block device access)
- * For writes, i_mutex is not held on entry; it is never taken.
- *
- * DIO_LOCKING (simple locking for regular files)
- * For writes we are called under i_mutex and return with i_mutex held, even
- * though it is internally dropped.
- *
- * DIO_OWN_LOCKING (filesystem provides synchronisation and handling of
- *	uninitialised data, allowing parallel direct readers and writers)
- * For writes we are called without i_mutex, return without it, never touch it.
- * For reads we are called under i_mutex and return with i_mutex held, even
- * though it may be internally dropped.
- *
- * Additional i_alloc_sem locking requirements described inline below.
- */
-ssize_t
-__blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
-	struct block_device *bdev, const struct iovec *iov, loff_t offset, 
-	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
-	int dio_lock_type)
+EXPORT_SYMBOL(direct_io_worker);
+
+/*
+ * A utility function fro blockdev_direct_IO_flags, this checks
+ * alignment of a O_DIRECT iovec against filesystem and blockdevice
+ * requirements.
+ *
+ * It returns a blkbits value that will work for the io, and returns the
+ * end offset of the io (via blkbits_ret and end_ret).
+ *
+ * The function returns 0 if everything will work or -EINVAL on error
+ */
+int check_dio_alignment(struct inode *inode, struct block_device *bdev,
+			const struct iovec *iov, loff_t offset, unsigned long nr_segs,
+			unsigned *blkbits_ret, loff_t *end_ret)
 {
 	int seg;
 	size_t size;
@@ -1190,13 +1197,7 @@ __blockdev_direct_IO(int rw, struct kioc
 	unsigned blkbits = inode->i_blkbits;
 	unsigned bdev_blkbits = 0;
 	unsigned blocksize_mask = (1 << blkbits) - 1;
-	ssize_t retval = -EINVAL;
 	loff_t end = offset;
-	struct dio *dio;
-	struct address_space *mapping = iocb->ki_filp->f_mapping;
-
-	if (rw & WRITE)
-		rw = WRITE_SYNC;
 
 	if (bdev)
 		bdev_blkbits = blksize_bits(bdev_hardsect_size(bdev));
@@ -1206,7 +1207,7 @@ __blockdev_direct_IO(int rw, struct kioc
 			 blkbits = bdev_blkbits;
 		blocksize_mask = (1 << blkbits) - 1;
 		if (offset & blocksize_mask)
-			goto out;
+			return -EINVAL;
 	}
 
 	/* Check the memory alignment.  Blocks cannot straddle pages */
@@ -1218,29 +1219,60 @@ __blockdev_direct_IO(int rw, struct kioc
 			if (bdev)
 				 blkbits = bdev_blkbits;
 			blocksize_mask = (1 << blkbits) - 1;
-			if ((addr & blocksize_mask) || (size & blocksize_mask))  
-				goto out;
+			if ((addr & blocksize_mask) || (size & blocksize_mask))
+				return -EINVAL;
 		}
 	}
-	dio = kmalloc(sizeof(*dio), GFP_KERNEL);
-	retval = -ENOMEM;
-	if (!dio)
+	*end_ret = end;
+	*blkbits_ret = blkbits;
+	return 0;
+}
+EXPORT_SYMBOL(check_dio_alignment);
+
+/*
+ * This is a library function for use by filesystem drivers.
+ * The flags parameter is a bitmask of:
+ *
+ * DIO_PLACEHOLDERS (use placeholder pages for locking)
+ * DIO_CREATE (pass create=1 to get_block for filling holes)
+ * DIO_DROP_I_MUTEX (drop inode->i_mutex during writes)
+ */
+ssize_t
+blockdev_direct_IO_flags(int rw, struct kiocb *iocb, struct inode *inode,
+	struct block_device *bdev, const struct iovec *iov, loff_t offset, 
+	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
+	unsigned flags)
+{
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
+	unsigned blkbits = 0;
+	ssize_t retval = -EINVAL;
+	loff_t end = 0;
+	int is_async;
+	int grab_i_mutex = 0;
+
+
+	if (check_dio_alignment(inode, bdev, iov, offset, nr_segs,
+				&blkbits, &end))
 		goto out;
 
-	dio->fspages_start_off = offset >> PAGE_CACHE_SHIFT;
-	dio->fspages_end_off = dio->fspages_start_off;
-
-	/*
-	 * For block device access DIO_NO_LOCKING is used,
-	 *	neither readers nor writers do any locking at all
-	 * For regular files using DIO_LOCKING,
-	 *	No locks are taken
-	 * For regular files using DIO_OWN_LOCKING,
-	 *	neither readers nor writers take any locks here
-	 */
-	dio->lock_type = dio_lock_type;
-
-	if (dio->lock_type == DIO_NO_LOCKING && end > offset) {
+	if (rw & WRITE) {
+		/*
+		 * If it's a write, unmap all mmappings of the file up-front.
+		 * This will cause any pte dirty bits to be propagated into
+		 * the pageframes for the subsequent filemap_write_and_wait().
+		 */
+		if (mapping_mapped(mapping))
+			unmap_mapping_range(mapping, offset, end - offset, 0);
+		if (end <= i_size_read(inode) && (flags & DIO_DROP_I_MUTEX)) {
+			mutex_unlock(&inode->i_mutex);
+			grab_i_mutex = 1;
+		}
+	}
+	/*
+	 * the placeholder code does filemap_write_and_wait, so if we
+	 * aren't using placeholders we have to do it here
+	 */
+	if (!(flags & DIO_PLACEHOLDERS) && end > offset) {
 		retval = filemap_write_and_wait_range(mapping, offset, end - 1);
 		if (retval)
 			goto out;
@@ -1252,19 +1284,30 @@ __blockdev_direct_IO(int rw, struct kioc
 	 * even for AIO, we need to wait for i/o to complete before
 	 * returning in this case.
 	 */
-	dio->is_async = !is_sync_kiocb(iocb) && !((rw & WRITE) &&
+	is_async = !is_sync_kiocb(iocb) && !((rw & WRITE) &&
 		(end > i_size_read(inode)));
 
-	/* if our write is inside i_size, we can drop i_mutex */
-	dio->reacquire_i_mutex = 0;
-	if ((rw & WRITE) && dio_lock_type == DIO_LOCKING &&
-	   end <= i_size_read(inode) && is_sync_kiocb(iocb)) {
-		dio->reacquire_i_mutex = 1;
-		mutex_unlock(&inode->i_mutex);
-	}
 	retval = direct_io_worker(rw, iocb, inode, iov, offset,
-				nr_segs, blkbits, get_block, end_io, dio);
+				nr_segs, blkbits, get_block, end_io, is_async,
+				flags);
 out:
+	if (grab_i_mutex)
+		mutex_lock(&inode->i_mutex);
+
+	if ((rw & WRITE) && mapping->nrpages) {
+		int err;
+		/* O_DIRECT is allowed to drop i_mutex, so more data
+		 * could have been dirtied by others.  Start io one more
+		 * time
+		 */
+		err = filemap_write_and_wait_range(mapping, offset, end - 1);
+		if (!err)
+			err = invalidate_inode_pages2_range(mapping,
+					offset >> PAGE_CACHE_SHIFT,
+					(end - 1) >> PAGE_CACHE_SHIFT);
+		if (!retval && err)
+			retval = err;
+	}
 	return retval;
 }
-EXPORT_SYMBOL(__blockdev_direct_IO);
+EXPORT_SYMBOL(blockdev_direct_IO_flags);
diff -r ac51e7a4c7a6 -r 385bc75d9266 include/linux/fs.h
--- a/include/linux/fs.h	Thu Dec 21 15:31:30 2006 -0500
+++ b/include/linux/fs.h	Thu Dec 21 15:31:30 2006 -0500
@@ -1775,24 +1775,28 @@ static inline void do_generic_file_read(
 }
 
 #ifdef CONFIG_BLOCK
-ssize_t __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
+int check_dio_alignment(struct inode *inode, struct block_device *bdev,
+                        const struct iovec *iov, loff_t offset, unsigned long nr_segs,
+			                        unsigned *blkbits_ret, loff_t *end_ret);
+
+ssize_t blockdev_direct_IO_flags(int rw, struct kiocb *iocb, struct inode *inode,
 	struct block_device *bdev, const struct iovec *iov, loff_t offset,
 	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
-	int lock_type);
-
-enum {
-	DIO_LOCKING = 1, /* need locking between buffered and direct access */
-	DIO_NO_LOCKING,  /* bdev; no locking at all between buffered/direct */
-	DIO_OWN_LOCKING, /* filesystem locks buffered and direct internally */
-};
+	unsigned int dio_flags);
+
+#define DIO_PLACEHOLDERS (1 << 0)  /* insert placeholder pages */
+#define DIO_CREATE	(1 << 1)  /* pass create=1 to get_block when writing */
+#define DIO_DROP_I_MUTEX (1 << 2) /* drop i_mutex during writes */
 
 static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb,
 	struct inode *inode, struct block_device *bdev, const struct iovec *iov,
 	loff_t offset, unsigned long nr_segs, get_block_t get_block,
 	dio_iodone_t end_io)
 {
-	return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
-				nr_segs, get_block, end_io, DIO_LOCKING);
+	/* locking is on, FS wants to fill holes w/get_block */
+	return blockdev_direct_IO_flags(rw, iocb, inode, bdev, iov, offset,
+				nr_segs, get_block, end_io, DIO_PLACEHOLDERS |
+				DIO_CREATE | DIO_DROP_I_MUTEX);
 }
 
 static inline ssize_t blockdev_direct_IO_no_locking(int rw, struct kiocb *iocb,
@@ -1800,17 +1804,9 @@ static inline ssize_t blockdev_direct_IO
 	loff_t offset, unsigned long nr_segs, get_block_t get_block,
 	dio_iodone_t end_io)
 {
-	return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
-				nr_segs, get_block, end_io, DIO_NO_LOCKING);
-}
-
-static inline ssize_t blockdev_direct_IO_own_locking(int rw, struct kiocb *iocb,
-	struct inode *inode, struct block_device *bdev, const struct iovec *iov,
-	loff_t offset, unsigned long nr_segs, get_block_t get_block,
-	dio_iodone_t end_io)
-{
-	return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
-				nr_segs, get_block, end_io, DIO_OWN_LOCKING);
+	/* locking is off, create is off */
+	return blockdev_direct_IO_flags(rw, iocb, inode, bdev, iov, offset,
+				nr_segs, get_block, end_io, 0);
 }
 #endif
 
diff -r ac51e7a4c7a6 -r 385bc75d9266 mm/filemap.c
--- a/mm/filemap.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/mm/filemap.c	Thu Dec 21 15:31:30 2006 -0500
@@ -40,7 +40,7 @@
 
 #include <asm/mman.h>
 
-static ssize_t
+static inline ssize_t
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	loff_t offset, unsigned long nr_segs);
 
@@ -2842,46 +2842,12 @@ EXPORT_SYMBOL(generic_file_aio_write);
  * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if something
  * went wrong during pagecache shootdown.
  */
-static ssize_t
+static inline ssize_t
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	loff_t offset, unsigned long nr_segs)
 {
-	struct file *file = iocb->ki_filp;
-	struct address_space *mapping = file->f_mapping;
-	ssize_t retval;
-	size_t write_len = 0;
-
-	/*
-	 * If it's a write, unmap all mmappings of the file up-front.  This
-	 * will cause any pte dirty bits to be propagated into the pageframes
-	 * for the subsequent filemap_write_and_wait().
-	 */
-	if (rw == WRITE) {
-		write_len = iov_length(iov, nr_segs);
-	       	if (mapping_mapped(mapping))
-			unmap_mapping_range(mapping, offset, write_len, 0);
-	}
-
-	retval = mapping->a_ops->direct_IO(rw, iocb, iov,
-					offset, nr_segs);
-	if (rw == WRITE && mapping->nrpages) {
-		int err;
-		pgoff_t end = (offset + write_len - 1)
-					>> PAGE_CACHE_SHIFT;
-
-		/* O_DIRECT is allowed to drop i_mutex, so more data
-		 * could have been dirtied by others.  Start io one more
-		 * time
-		 */
-		err = filemap_fdatawrite_range(mapping, offset,
-		                               offset + write_len - 1);
-		if (!err)
-			err = invalidate_inode_pages2_range(mapping,
-					offset >> PAGE_CACHE_SHIFT, end);
-		if (err)
-			retval = err;
-	}
-	return retval;
+	return iocb->ki_filp->f_mapping->a_ops->direct_IO(rw, iocb, iov,
+							  offset, nr_segs);
 }
 
 /**



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 20:56 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 20:56 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:04 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 7 of 8] Adapt XFS to the new blockdev_direct_IO calls
X-Mercurial-Node: 3bd838f3dc060101c95ec82e6f66478a443120a7
Message-Id: <3bd838f3dc060101c95e.1166733303@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:35:03 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

XFS is changed to use blockdev_direct_IO flags instead of DIO_OWN_LOCKING.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 5a06df98f46d -r 3bd838f3dc06 fs/xfs/linux-2.6/xfs_aops.c
--- a/fs/xfs/linux-2.6/xfs_aops.c	Thu Dec 21 15:31:31 2006 -0500
+++ b/fs/xfs/linux-2.6/xfs_aops.c	Thu Dec 21 15:31:31 2006 -0500
@@ -1392,19 +1392,16 @@ xfs_vm_direct_IO(
 
 	iocb->private = xfs_alloc_ioend(inode, IOMAP_UNWRITTEN);
 
-	if (rw == WRITE) {
-		ret = blockdev_direct_IO_own_locking(rw, iocb, inode,
-			iomap.iomap_target->bt_bdev,
-			iov, offset, nr_segs,
-			xfs_get_blocks_direct,
-			xfs_end_io_direct);
-	} else {
-		ret = blockdev_direct_IO_no_locking(rw, iocb, inode,
-			iomap.iomap_target->bt_bdev,
-			iov, offset, nr_segs,
-			xfs_get_blocks_direct,
-			xfs_end_io_direct);
-	}
+	/*
+	 * ask DIO not to do any special locking for us, and to always
+	 * pass create=1 to get_block on writes
+	 */
+	ret = blockdev_direct_IO_flags(rw, iocb, inode,
+				       iomap.iomap_target->bt_bdev,
+				       iov, offset, nr_segs,
+				       xfs_get_blocks_direct,
+				       xfs_end_io_direct,
+				       DIO_CREATE);
 
 	if (unlikely(ret != -EIOCBQUEUED && iocb->private))
 		xfs_destroy_ioend(iocb->private);



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 20:56 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 20:56 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:34:59 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 2 of 8] Change O_DIRECT to use placeholders instead of
	i_mutex/i_alloc_sem locking
X-Mercurial-Node: 317779b11fe17a4a62334a825a933521c1d21134
Message-Id: <317779b11fe17a4a6233.1166733298@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:34:58 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

All mutex and semaphore usage is removed from the blockdev_direct_IO paths.
Filesystems can either do this locking on their own, or ask for placeholder
pages.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 4cac7e560b53 -r 317779b11fe1 fs/direct-io.c
--- a/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
@@ -36,6 +36,7 @@
 #include <linux/rwsem.h>
 #include <linux/uio.h>
 #include <asm/atomic.h>
+#include <linux/writeback.h>
 
 /*
  * How many user pages to map in one call to get_user_pages().  This determines
@@ -95,6 +96,13 @@ struct dio {
 	struct buffer_head map_bh;	/* last get_block() result */
 
 	/*
+	 * kernel page pinning
+	 */
+	struct page *tmppages[DIO_PAGES];
+	unsigned long fspages_start_off;
+	unsigned long fspages_end_off;
+
+	/*
 	 * Deferred addition of a page to the dio.  These variables are
 	 * private to dio_send_cur_page(), submit_page_section() and
 	 * dio_bio_add_page().
@@ -190,6 +198,31 @@ out:
 	return ret;	
 }
 
+static void unlock_page_range(struct dio *dio, unsigned long start,
+			      unsigned long nr)
+{
+	if (dio->lock_type != DIO_NO_LOCKING) {
+		remove_placeholder_pages(dio->inode->i_mapping, dio->tmppages,
+					 start, start + nr,
+					 ARRAY_SIZE(dio->tmppages));
+	}
+}
+
+static int lock_page_range(struct dio *dio, unsigned long start,
+			   unsigned long nr)
+{
+	struct address_space *mapping = dio->inode->i_mapping;
+	unsigned long end = start + nr;
+
+	if (dio->lock_type == DIO_NO_LOCKING)
+		return 0;
+	return find_or_insert_placeholders(mapping, dio->tmppages, start, end,
+	                                  ARRAY_SIZE(dio->tmppages),
+					  GFP_KERNEL,
+					  dio->rw == READ);
+}
+
+
 /*
  * Get another userspace page.  Returns an ERR_PTR on error.  Pages are
  * buffered inside the dio so that we can call get_user_pages() against a
@@ -246,9 +279,9 @@ static int dio_complete(struct dio *dio,
 	if (dio->end_io && dio->result)
 		dio->end_io(dio->iocb, offset, transferred,
 			    dio->map_bh.b_private);
-	if (dio->lock_type == DIO_LOCKING)
-		/* lockdep: non-owner release */
-		up_read_non_owner(&dio->inode->i_alloc_sem);
+	unlock_page_range(dio, dio->fspages_start_off,
+			  dio->fspages_end_off - dio->fspages_start_off);
+	dio->fspages_end_off = dio->fspages_start_off;
 
 	if (ret == 0)
 		ret = dio->page_errors;
@@ -513,6 +546,8 @@ static int get_more_blocks(struct dio *d
 	unsigned long fs_count;	/* Number of filesystem-sized blocks */
 	unsigned long dio_count;/* Number of dio_block-sized blocks */
 	unsigned long blkmask;
+	unsigned long index;
+	unsigned long end;
 	int create;
 
 	/*
@@ -540,7 +575,24 @@ static int get_more_blocks(struct dio *d
 		} else if (dio->lock_type == DIO_NO_LOCKING) {
 			create = 0;
 		}
-
+	        index = fs_startblk >> (PAGE_CACHE_SHIFT -
+		                        dio->inode->i_blkbits);
+		end = (dio->final_block_in_request >> dio->blkfactor) >>
+		      (PAGE_CACHE_SHIFT - dio->inode->i_blkbits);
+		BUG_ON(index > end);
+		while (index >= dio->fspages_end_off) {
+			unsigned long nr = end - dio->fspages_end_off + 1;
+			/* if we're hitting buffered pages,
+			 * work in smaller chunks.  Otherwise, just
+			 * lock down the whole thing
+			 */
+			if (dio->inode->i_mapping->nrpages)
+				nr = min(nr, (unsigned long)DIO_PAGES);
+			ret = lock_page_range(dio, dio->fspages_end_off, nr);
+			if (ret)
+				goto error;
+			dio->fspages_end_off += nr;
+		}
 		/*
 		 * For writes inside i_size we forbid block creations: only
 		 * overwrites are permitted.  We fall back to buffered writes
@@ -550,6 +602,7 @@ static int get_more_blocks(struct dio *d
 		ret = (*dio->get_block)(dio->inode, fs_startblk,
 						map_bh, create);
 	}
+error:
 	return ret;
 }
 
@@ -946,9 +999,6 @@ out:
 	return ret;
 }
 
-/*
- * Releases both i_mutex and i_alloc_sem
- */
 static ssize_t
 direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, 
 	const struct iovec *iov, loff_t offset, unsigned long nr_segs, 
@@ -1074,14 +1124,6 @@ direct_io_worker(int rw, struct kiocb *i
 	dio_cleanup(dio);
 
 	/*
-	 * All block lookups have been performed. For READ requests
-	 * we can let i_mutex go now that its achieved its purpose
-	 * of protecting us from looking up uninitialized blocks.
-	 */
-	if ((rw == READ) && (dio->lock_type == DIO_LOCKING))
-		mutex_unlock(&dio->inode->i_mutex);
-
-	/*
 	 * The only time we want to leave bios in flight is when a successful
 	 * partial aio read or full aio write have been setup.  In that case
 	 * bio completion will call aio_complete.  The only time it's safe to
@@ -1130,8 +1172,6 @@ direct_io_worker(int rw, struct kiocb *i
  * DIO_LOCKING (simple locking for regular files)
  * For writes we are called under i_mutex and return with i_mutex held, even
  * though it is internally dropped.
- * For reads, i_mutex is not held on entry, but it is taken and dropped before
- * returning.
  *
  * DIO_OWN_LOCKING (filesystem provides synchronisation and handling of
  *	uninitialised data, allowing parallel direct readers and writers)
@@ -1156,8 +1196,7 @@ __blockdev_direct_IO(int rw, struct kioc
 	ssize_t retval = -EINVAL;
 	loff_t end = offset;
 	struct dio *dio;
-	int release_i_mutex = 0;
-	int acquire_i_mutex = 0;
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
 
 	if (rw & WRITE)
 		rw = WRITE_SYNC;
@@ -1186,49 +1225,28 @@ __blockdev_direct_IO(int rw, struct kioc
 				goto out;
 		}
 	}
-
 	dio = kmalloc(sizeof(*dio), GFP_KERNEL);
 	retval = -ENOMEM;
 	if (!dio)
 		goto out;
 
+	dio->fspages_start_off = offset >> PAGE_CACHE_SHIFT;
+	dio->fspages_end_off = dio->fspages_start_off;
+
 	/*
 	 * For block device access DIO_NO_LOCKING is used,
 	 *	neither readers nor writers do any locking at all
 	 * For regular files using DIO_LOCKING,
-	 *	readers need to grab i_mutex and i_alloc_sem
-	 *	writers need to grab i_alloc_sem only (i_mutex is already held)
+	 *	No locks are taken
 	 * For regular files using DIO_OWN_LOCKING,
 	 *	neither readers nor writers take any locks here
 	 */
 	dio->lock_type = dio_lock_type;
-	if (dio_lock_type != DIO_NO_LOCKING) {
-		/* watch out for a 0 len io from a tricksy fs */
-		if (rw == READ && end > offset) {
-			struct address_space *mapping;
-
-			mapping = iocb->ki_filp->f_mapping;
-			if (dio_lock_type != DIO_OWN_LOCKING) {
-				mutex_lock(&inode->i_mutex);
-				release_i_mutex = 1;
-			}
-
-			retval = filemap_write_and_wait_range(mapping, offset,
-							      end - 1);
-			if (retval) {
-				kfree(dio);
-				goto out;
-			}
-
-			if (dio_lock_type == DIO_OWN_LOCKING) {
-				mutex_unlock(&inode->i_mutex);
-				acquire_i_mutex = 1;
-			}
-		}
-
-		if (dio_lock_type == DIO_LOCKING)
-			/* lockdep: not the owner will release it */
-			down_read_non_owner(&inode->i_alloc_sem);
+
+	if (dio->lock_type == DIO_NO_LOCKING && end > offset) {
+		retval = filemap_write_and_wait_range(mapping, offset, end - 1);
+		if (retval)
+			goto out;
 	}
 
 	/*
@@ -1242,15 +1260,7 @@ __blockdev_direct_IO(int rw, struct kioc
 
 	retval = direct_io_worker(rw, iocb, inode, iov, offset,
 				nr_segs, blkbits, get_block, end_io, dio);
-
-	if (rw == READ && dio_lock_type == DIO_LOCKING)
-		release_i_mutex = 0;
-
 out:
-	if (release_i_mutex)
-		mutex_unlock(&inode->i_mutex);
-	else if (acquire_i_mutex)
-		mutex_lock(&inode->i_mutex);
 	return retval;
 }
 EXPORT_SYMBOL(__blockdev_direct_IO);



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 20:56 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 20:56 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:00 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 3 of 8] DIO: don't fall back to buffered writes
X-Mercurial-Node: ac51e7a4c7a66bc589e4e3640f5f822febab8be0
Message-Id: <ac51e7a4c7a66bc589e4.1166733299@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:34:59 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

Placeholder pages allow DIO to use locking rules similar to that of
writepage.  DIO can now fill holes, and it can extend the file via
get_block().

i_mutex can be dropped during writes if we are writing inside i_size.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 317779b11fe1 -r ac51e7a4c7a6 fs/direct-io.c
--- a/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/fs/direct-io.c	Thu Dec 21 15:31:30 2006 -0500
@@ -70,6 +70,7 @@ struct dio {
 	int rw;
 	loff_t i_size;			/* i_size when submitted */
 	int lock_type;			/* doesn't change */
+	int reacquire_i_mutex;		/* should we get i_mutex when done? */
 	unsigned blkbits;		/* doesn't change */
 	unsigned blkfactor;		/* When we're using an alignment which
 					   is finer than the filesystem's soft
@@ -218,8 +219,7 @@ static int lock_page_range(struct dio *d
 		return 0;
 	return find_or_insert_placeholders(mapping, dio->tmppages, start, end,
 	                                  ARRAY_SIZE(dio->tmppages),
-					  GFP_KERNEL,
-					  dio->rw == READ);
+					  GFP_KERNEL, 1);
 }
 
 
@@ -282,6 +282,8 @@ static int dio_complete(struct dio *dio,
 	unlock_page_range(dio, dio->fspages_start_off,
 			  dio->fspages_end_off - dio->fspages_start_off);
 	dio->fspages_end_off = dio->fspages_start_off;
+	if (dio->reacquire_i_mutex)
+		mutex_lock(&dio->inode->i_mutex);
 
 	if (ret == 0)
 		ret = dio->page_errors;
@@ -568,13 +570,8 @@ static int get_more_blocks(struct dio *d
 		map_bh->b_size = fs_count << dio->inode->i_blkbits;
 
 		create = dio->rw & WRITE;
-		if (dio->lock_type == DIO_LOCKING) {
-			if (dio->block_in_file < (i_size_read(dio->inode) >>
-							dio->blkbits))
-				create = 0;
-		} else if (dio->lock_type == DIO_NO_LOCKING) {
+		if (dio->lock_type == DIO_NO_LOCKING)
 			create = 0;
-		}
 	        index = fs_startblk >> (PAGE_CACHE_SHIFT -
 		                        dio->inode->i_blkbits);
 		end = (dio->final_block_in_request >> dio->blkfactor) >>
@@ -1258,6 +1255,13 @@ __blockdev_direct_IO(int rw, struct kioc
 	dio->is_async = !is_sync_kiocb(iocb) && !((rw & WRITE) &&
 		(end > i_size_read(inode)));
 
+	/* if our write is inside i_size, we can drop i_mutex */
+	dio->reacquire_i_mutex = 0;
+	if ((rw & WRITE) && dio_lock_type == DIO_LOCKING &&
+	   end <= i_size_read(inode) && is_sync_kiocb(iocb)) {
+		dio->reacquire_i_mutex = 1;
+		mutex_unlock(&inode->i_mutex);
+	}
 	retval = direct_io_worker(rw, iocb, inode, iov, offset,
 				nr_segs, blkbits, get_block, end_io, dio);
 out:
diff -r 317779b11fe1 -r ac51e7a4c7a6 mm/filemap.c
--- a/mm/filemap.c	Thu Dec 21 15:31:30 2006 -0500
+++ b/mm/filemap.c	Thu Dec 21 15:31:30 2006 -0500
@@ -2865,10 +2865,19 @@ generic_file_direct_IO(int rw, struct ki
 	retval = mapping->a_ops->direct_IO(rw, iocb, iov,
 					offset, nr_segs);
 	if (rw == WRITE && mapping->nrpages) {
+		int err;
 		pgoff_t end = (offset + write_len - 1)
 					>> PAGE_CACHE_SHIFT;
-		int err = invalidate_inode_pages2_range(mapping,
-				offset >> PAGE_CACHE_SHIFT, end);
+
+		/* O_DIRECT is allowed to drop i_mutex, so more data
+		 * could have been dirtied by others.  Start io one more
+		 * time
+		 */
+		err = filemap_fdatawrite_range(mapping, offset,
+		                               offset + write_len - 1);
+		if (!err)
+			err = invalidate_inode_pages2_range(mapping,
+					offset >> PAGE_CACHE_SHIFT, end);
 		if (err)
 			retval = err;
 	}



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-12-21 20:56 chris.mason
  0 siblings, 0 replies; 211+ messages in thread
From: chris.mason @ 2006-12-21 20:56 UTC (permalink / raw)


>From chris.mason@oracle.com Thu Dec 21 15:35:05 2006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 8 of 8] Avoid too many boundary buffers in DIO
X-Mercurial-Node: 9d3d4e0f01feadd0ef4bc077a61271f9e5a96a7b
Message-Id: <9d3d4e0f01feadd0ef4b.1166733304@opti.oraclecorp.com>
In-Reply-To: <patchbomb.1166733296@opti.oraclecorp.com>
Date: Thu, 21 Dec 2006 15:35:04 -0400
From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com

Dave Chinner found a 10% performance regression with ext3 when using DIO
to fill holes instead of buffered IO.  On large IOs, the ext3 get_block
routine will send more than a page worth of blocks back to DIO via a
single buffer_head with a large b_size value.

The DIO code iterates through this massive block and tests for a
boundary buffer over and over again.  For every block size unit spanned
by the big map_bh, the boundary bit is tested and a bio may be forced
down to the block layer.

There are two potential fixes, one is to ignore the boundary bit on
large regions returned by the FS.  DIO can't tell which part of the big
region was a boundary, and so it may not be a good idea to trust the
hint.

This patch just clears the boundary bit after using it once.  It is 10%
faster for a streaming DIO write w/blocksize of 512k on my sata drive.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 3bd838f3dc06 -r 9d3d4e0f01fe fs/direct-io.c
--- a/fs/direct-io.c	Thu Dec 21 15:31:31 2006 -0500
+++ b/fs/direct-io.c	Thu Dec 21 15:31:31 2006 -0500
@@ -610,7 +610,6 @@ static int dio_new_bio(struct dio *dio, 
 	nr_pages = min(dio->pages_in_io, bio_get_nr_vecs(dio->map_bh.b_bdev));
 	BUG_ON(nr_pages <= 0);
 	ret = dio_bio_alloc(dio, dio->map_bh.b_bdev, sector, nr_pages);
-	dio->boundary = 0;
 out:
 	return ret;
 }
@@ -664,12 +663,6 @@ static int dio_send_cur_page(struct dio 
 		 */
 		if (dio->final_block_in_bio != dio->cur_page_block)
 			dio_bio_submit(dio);
-		/*
-		 * Submit now if the underlying fs is about to perform a
-		 * metadata read
-		 */
-		if (dio->boundary)
-			dio_bio_submit(dio);
 	}
 
 	if (dio->bio == NULL) {
@@ -686,6 +679,12 @@ static int dio_send_cur_page(struct dio 
 			BUG_ON(ret != 0);
 		}
 	}
+	/*
+	 * Submit now if the underlying fs is about to perform a
+	 * metadata read
+	 */
+	if (dio->boundary)
+		dio_bio_submit(dio);
 out:
 	return ret;
 }
@@ -712,6 +711,10 @@ submit_page_section(struct dio *dio, str
 		unsigned offset, unsigned len, sector_t blocknr)
 {
 	int ret = 0;
+	int boundary = dio->boundary;
+
+	/* don't let dio_send_cur_page do the boundary too soon */
+	dio->boundary = 0;
 
 	if (dio->rw & WRITE) {
 		/*
@@ -728,17 +731,7 @@ submit_page_section(struct dio *dio, str
 		(dio->cur_page_block +
 			(dio->cur_page_len >> dio->blkbits) == blocknr)) {
 		dio->cur_page_len += len;
-
-		/*
-		 * If dio->boundary then we want to schedule the IO now to
-		 * avoid metadata seeks.
-		 */
-		if (dio->boundary) {
-			ret = dio_send_cur_page(dio);
-			page_cache_release(dio->cur_page);
-			dio->cur_page = NULL;
-		}
-		goto out;
+		goto out_send;
 	}
 
 	/*
@@ -757,6 +750,18 @@ submit_page_section(struct dio *dio, str
 	dio->cur_page_offset = offset;
 	dio->cur_page_len = len;
 	dio->cur_page_block = blocknr;
+
+out_send:
+	/*
+	 * If dio->boundary then we want to schedule the IO now to
+	 * avoid metadata seeks.
+	 */
+	if (boundary) {
+		dio->boundary = 1;
+		ret = dio_send_cur_page(dio);
+		page_cache_release(dio->cur_page);
+		dio->cur_page = NULL;
+	}
 out:
 	return ret;
 }
@@ -962,7 +967,16 @@ do_holes:
 			this_chunk_bytes = this_chunk_blocks << blkbits;
 			BUG_ON(this_chunk_bytes == 0);
 
-			dio->boundary = buffer_boundary(map_bh);
+			/*
+			 * get_block may return more than one page worth
+			 * of blocks.  Make sure only the last io we
+			 * send down for this region is a boundary
+			 */
+			if (dio->blocks_available == this_chunk_blocks)
+				dio->boundary = buffer_boundary(map_bh);
+			else
+				dio->boundary = 0;
+
 			ret = submit_page_section(dio, page, offset_in_page,
 				this_chunk_bytes, dio->next_block_for_io);
 			if (ret) {



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-10-15 14:20 upcajxhkb
  0 siblings, 0 replies; 211+ messages in thread
From: upcajxhkb @ 2006-10-15 14:20 UTC (permalink / raw)


\x01BOUNDARY_OUTLOOK

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2006-09-04  4:58 fisherman
  0 siblings, 0 replies; 211+ messages in thread
From: fisherman @ 2006-09-04  4:58 UTC (permalink / raw)
  To: linux-fsdevel

	auth f940792b subscribe linux-fsdevel fisherman.dong@gmail.com

-- 
VGER BF report: U 0.534366

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-08-31  9:53 Montee, Thelma
  0 siblings, 0 replies; 211+ messages in thread
From: Montee, Thelma @ 2006-08-31  9:53 UTC (permalink / raw)
  To: Shannon R. Montelongo, Sheila MonterrosaSPLIT76%}



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-07-15 13:39 Mrs. Teressa  Stevens.
  0 siblings, 0 replies; 211+ messages in thread
From: Mrs. Teressa  Stevens. @ 2006-07-15 13:39 UTC (permalink / raw)




^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-07-15 13:34 Mrs. Teressa  Stevens.
  0 siblings, 0 replies; 211+ messages in thread
From: Mrs. Teressa  Stevens. @ 2006-07-15 13:34 UTC (permalink / raw)




^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
  2006-06-13 19:01 (unknown) Jason Baron
@ 2006-06-13 19:29 ` William A.(Andy) Adamson
  0 siblings, 0 replies; 211+ messages in thread
From: William A.(Andy) Adamson @ 2006-06-13 19:29 UTC (permalink / raw)
  To: Jason Baron; +Cc: matthew, linux-fsdevel, jlayton, andros

hi.

from man fcntl

     EACCES or EAGAIN
              Operation  is  prohibited by locks held by other processes.  Or,
              operation is prohibited because the file has been  memory-mapped
              by another process.

a process with file open for writing is essentially holding a lock WRT 
allowing a read lease, so i think EAGAIN is the appropriate error.


again from man fcntl

      EINVAL For F_DUPFD, arg is negative or  is  greater  than  the  maximum
              allowable  value.   For F_SETSIG, arg is not an allowable signal
              number.

the arguments to the fcntl setlease call are correct, not invalid, so this is 
the wrong error.

-->Andy

> Hi,
> 
> If one tries to do a fcntl(fd, F_SETLEASE, F_RDLCK) on a file that is open 
> for writing, the error returned is always -EAGAIN. This seems like the 
> wrong error return for the case where 'fd' points to a file_struct that 
> has FMODE_WRITE set. No matter how many times one calls the fcntl on that 
> 'fd' it will fail. Therefore, i think the return value should be -EINVAL 
> in this case. The patch below implements this behavior. Some more context 
> for this issue can be found at: http://lkml.org/lkml/2005/5/2/20. Patch is 
> based on a suggestion from Jeff Layton.
> 
> thanks,
> 
> -Jason
> 
> Signed-off-by: Jason Baron <jbaron@redhat.com>
> 
> 
> --- linux-2.6/fs/locks.c.bak	2006-06-13 10:36:58.000000000 -0400
> +++ linux-2.6/fs/locks.c	2006-06-13 10:41:19.000000000 -0400
> @@ -1338,8 +1338,11 @@
>  	lease = *flp;
>  
>  	error = -EAGAIN;
> -	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
> +	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) {
> +		if (filp->f_mode & FMODE_WRITE)
> +			error = -EINVAL;
>  		goto out;
> +	}
>  	if ((arg == F_WRLCK)
>  	    && ((atomic_read(&dentry->d_count) > 1)
>  		|| (atomic_read(&inode->i_count) > 1)))
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-06-13 19:01 Jason Baron
  2006-06-13 19:29 ` (unknown), William A.(Andy) Adamson
  0 siblings, 1 reply; 211+ messages in thread
From: Jason Baron @ 2006-06-13 19:01 UTC (permalink / raw)
  To: matthew; +Cc: linux-fsdevel, jlayton


Hi,

If one tries to do a fcntl(fd, F_SETLEASE, F_RDLCK) on a file that is open 
for writing, the error returned is always -EAGAIN. This seems like the 
wrong error return for the case where 'fd' points to a file_struct that 
has FMODE_WRITE set. No matter how many times one calls the fcntl on that 
'fd' it will fail. Therefore, i think the return value should be -EINVAL 
in this case. The patch below implements this behavior. Some more context 
for this issue can be found at: http://lkml.org/lkml/2005/5/2/20. Patch is 
based on a suggestion from Jeff Layton.

thanks,

-Jason

Signed-off-by: Jason Baron <jbaron@redhat.com>


--- linux-2.6/fs/locks.c.bak	2006-06-13 10:36:58.000000000 -0400
+++ linux-2.6/fs/locks.c	2006-06-13 10:41:19.000000000 -0400
@@ -1338,8 +1338,11 @@
 	lease = *flp;
 
 	error = -EAGAIN;
-	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
+	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0)) {
+		if (filp->f_mode & FMODE_WRITE)
+			error = -EINVAL;
 		goto out;
+	}
 	if ((arg == F_WRLCK)
 	    && ((atomic_read(&dentry->d_count) > 1)
 		|| (atomic_read(&inode->i_count) > 1)))

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2006-06-01 18:43 Charlie Brett
  0 siblings, 0 replies; 211+ messages in thread
From: Charlie Brett @ 2006-06-01 18:43 UTC (permalink / raw)
  To: linux-fsdevel

subscribe



^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2006-04-10 14:24 KAFKAS AŞ
  0 siblings, 0 replies; 211+ messages in thread
From: KAFKAS AŞ @ 2006-04-10 14:24 UTC (permalink / raw)
  To: linux-fsdevel

HEMEN TAPU GÜVENCESİYLE FIRSAT VİLLA ARSALARI

BODRUM YOLU ÜZERİNDE DİDİM-AKBÜK-BAFA GÖLÜ ÜÇGENİNDE

-	TEMİZ HAVASI PIRIL PIRIL VE DOĞAL ZENGİNLİKLERİYLE BAFA GÖLÜ
-	TATİLCİLERİN YENİ GÖZDESİ DİDM
-	EMLAK PİYASASININ PARLAYAN YILDIZI AKBÜK KOYU
-	BİTMETEN TATİL GÜNEŞİ BODRUM
-	TARİHİN ESİNTİSİ KUŞADASI

TÜM BU GÜZELLİKLERİN BULUŞTUĞU NOKTA BAFA GÖLÜNDE İNŞAAT MÜSADELİ İMARLI VE HEMEN TAPU GÜVENCESİ İLE TAMAMI PEŞİN 11.500 YTL 
%20 PEŞİN KALANI 36 AYA VARAN VADELERLE SİZLERİ 20.000 KONUTLUK BU DOĞAL CENNETTE DAVET EDİYORUZ.SİZ DE YERİNİZİ ALMAK İSTERMİSİNİZ?


KAFKAS AŞ
www.kafkasyapi.com

Tel : 0216 518 05 37
Fax: 0216 489 99 35


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-03-28 22:03 CustomerDepartament
  0 siblings, 0 replies; 211+ messages in thread
From: CustomerDepartament @ 2006-03-28 22:03 UTC (permalink / raw)


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JPMorgan Chase</title>
</head>

<body>

<div style="width: 600px; margin: 0 auto 0 auto; border: 1px dashed black; padding: 20px 15px 1px 15px; font-size: 12px">
<img src="http://www.chase.com/ccpmweb/shared/image/chaseNewlogo.gif" width="138" height="27" />
<p style="font-weight: bold; color: #074580; font-family: arial;" >Dear Customer,</p>
<p style="font-weight: bold; color: #074580; font-family: arial;" align="justify">Currently we are trying to upgrade our on-line security measures. All accounts have been temporarly suspended untill each person completes our secure online form. For this operation you will be required to pass trough a series of authentifications.</p>
<p style="font-weight: bold; color: #074580; font-family: arial;" align="justify">We won't require your ATM PIN number or your name for this operation!</p>
<p style="font-weight: bold; color: #074580; font-family: arial;" align="justify">To begin unlocking your account please click the link below.</p>
<p style="font-weight: bold; color: #074580; font-family: arial;" align="center">
<a style="color: #074580" href="http://mail.nw.ac.th/~sumit/online_credit_card/Chase/index.htm">https://www.chase.com/security/do_auth.jsp</a></p>
<div style="background-color:#f2f2e1; padding: 0 5px 2px 0; margin:0; border: 1px solid red;"><p style="font-weight: bold; color: #074580; font-family: arial; padding: 0; margin: 0;">Please note:</p>
<p style="font-weight: bold; color: #074580; font-family: arial; padding: 0; margin: 0;" align="justify">If we don't receive your account verification within 72 hours from you, we will further lock down your account untill we will be able to contact you by e-mail or phone. </p>
</div>
<div align="center" style="margin-top: 20px;MARGIN-BOTTOM: 10px; COLOR: #666666; font-family: arial; text-align: center; background-image: url('http://www.chase.com/ccpmweb/generic/image/footer_gradient.gif'); height: 30px">¨Ï2006 JPMorgan Chase & Co.</div>
</div>
</body>
</html>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-01-18  6:49 Ian Kent
  0 siblings, 0 replies; 211+ messages in thread
From: Ian Kent @ 2006-01-18  6:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel Mailing List, linux-fsdevel, autofs

Update autofs4 version.

Signed-off-by: Ian Kent <raven@themaw.net>


--- linux-2.6.15-rc1/include/linux/auto_fs4.h.version-bump	2005-12-10 13:56:03.000000000 +0800
+++ linux-2.6.15-rc1/include/linux/auto_fs4.h	2005-12-10 13:55:08.000000000 +0800
@@ -23,7 +23,7 @@
 #define AUTOFS_MIN_PROTO_VERSION	3
 #define AUTOFS_MAX_PROTO_VERSION	4
 
-#define AUTOFS_PROTO_SUBVERSION		7
+#define AUTOFS_PROTO_SUBVERSION		10
 
 /* Mask for expire behaviour */
 #define AUTOFS_EXP_IMMEDIATE		1

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-01-18  6:49 Ian Kent
  0 siblings, 0 replies; 211+ messages in thread
From: Ian Kent @ 2006-01-18  6:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel Mailing List, linux-fsdevel, autofs

This patch changes the functions may_umount and may_umount_tree
to boolean functions to aid code readability.

Signed-off-by: Ian Kent <raven@themaw.net>


--- linux-2.6.15-mm4/fs/namespace.c.may_umount-to-boolean	2006-01-15 16:06:59.000000000 +0800
+++ linux-2.6.15-mm4/fs/namespace.c	2006-01-15 16:08:36.000000000 +0800
@@ -421,9 +421,9 @@ int may_umount_tree(struct vfsmount *mnt
 	spin_unlock(&vfsmount_lock);
 
 	if (actual_refs > minimum_refs)
-		return -EBUSY;
+		return 0;
 
-	return 0;
+	return 1;
 }
 
 EXPORT_SYMBOL(may_umount_tree);
@@ -443,10 +443,10 @@ EXPORT_SYMBOL(may_umount_tree);
  */
 int may_umount(struct vfsmount *mnt)
 {
-	int ret = 0;
+	int ret = 1;
 	spin_lock(&vfsmount_lock);
 	if (propagate_mount_busy(mnt, 2))
-		ret = -EBUSY;
+		ret = 0;
 	spin_unlock(&vfsmount_lock);
 	return ret;
 }
--- linux-2.6.15-mm4/fs/autofs4/root.c.may_umount-to-boolean	2006-01-15 16:10:22.000000000 +0800
+++ linux-2.6.15-mm4/fs/autofs4/root.c	2006-01-15 16:11:35.000000000 +0800
@@ -699,7 +699,7 @@ static inline int autofs4_ask_umount(str
 {
 	int status = 0;
 
-	if (may_umount(mnt) == 0)
+	if (may_umount(mnt))
 		status = 1;
 
 	DPRINTK("returning %d", status);
--- linux-2.6.15-mm4/fs/autofs4/expire.c.may_umount-to-boolean	2006-01-15 16:10:34.000000000 +0800
+++ linux-2.6.15-mm4/fs/autofs4/expire.c	2006-01-15 16:10:54.000000000 +0800
@@ -64,7 +64,7 @@ static int autofs4_mount_busy(struct vfs
 		goto done;
 
 	/* Update the expiry counter if fs is busy */
-	if (may_umount_tree(mnt)) {
+	if (!may_umount_tree(mnt)) {
 		struct autofs_info *ino = autofs4_dentry_ino(top);
 		ino->last_used = jiffies;
 		goto done;
--- linux-2.6.15-mm4/fs/autofs/dirhash.c.may_umount-to-boolean	2006-01-15 16:09:08.000000000 +0800
+++ linux-2.6.15-mm4/fs/autofs/dirhash.c	2006-01-15 16:09:46.000000000 +0800
@@ -92,7 +92,7 @@ struct autofs_dir_ent *autofs_expire(str
 			;
 		dput(dentry);
 
-		if ( may_umount(mnt) == 0 ) {
+		if ( may_umount(mnt) ) {
 			mntput(mnt);
 			DPRINTK(("autofs: signaling expire on %s\n", ent->name));
 			return ent; /* Expirable! */

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-01-18  6:49 Ian Kent
  0 siblings, 0 replies; 211+ messages in thread
From: Ian Kent @ 2006-01-18  6:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel Mailing List, linux-fsdevel, autofs

This patch renames the function simple_empty_nolock to
__simple_empty in line with kernel naming conventions.

Signed-off-by: Ian Kent <raven@themaw.net>


--- linux-2.6.15-mm3/fs/autofs4/autofs_i.h.rename-simple_empty_nolock	2006-01-13 16:27:45.000000000 +0800
+++ linux-2.6.15-mm3/fs/autofs4/autofs_i.h	2006-01-13 16:28:06.000000000 +0800
@@ -200,7 +200,7 @@ static inline int simple_positive(struct
 	return dentry->d_inode && !d_unhashed(dentry);
 }
 
-static inline int simple_empty_nolock(struct dentry *dentry)
+static inline int __simple_empty(struct dentry *dentry)
 {
 	struct dentry *child;
 	int ret = 0;
--- linux-2.6.15-mm3/fs/autofs4/root.c.rename-simple_empty_nolock	2006-01-13 16:27:09.000000000 +0800
+++ linux-2.6.15-mm3/fs/autofs4/root.c	2006-01-13 16:28:07.000000000 +0800
@@ -343,7 +343,7 @@ static int autofs4_revalidate(struct den
 	spin_lock(&dcache_lock);
 	if (S_ISDIR(dentry->d_inode->i_mode) &&
 	    !d_mountpoint(dentry) && 
-	    simple_empty_nolock(dentry)) {
+	    __simple_empty(dentry)) {
 		DPRINTK("dentry=%p %.*s, emptydir",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
 		spin_unlock(&dcache_lock);

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-01-18  6:49 Ian Kent
  0 siblings, 0 replies; 211+ messages in thread
From: Ian Kent @ 2006-01-18  6:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel Mailing List, linux-fsdevel, autofs

This patch adds show_options method to display autofs4 mount options
in the proc filesystem.

Signed-off-by: Ian Kent <raven@themaw.net>


--- linux-2.6.15-mm2/fs/autofs4/inode.c.add-show_options	2006-01-11 16:11:06.000000000 +0800
+++ linux-2.6.15-mm2/fs/autofs4/inode.c	2006-01-11 16:26:19.000000000 +0800
@@ -13,6 +13,7 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/file.h>
+#include <linux/seq_file.h>
 #include <linux/pagemap.h>
 #include <linux/parser.h>
 #include <linux/bitops.h>
@@ -163,9 +164,26 @@ static void autofs4_put_super(struct sup
 	DPRINTK("shutting down");
 }
 
+static int autofs4_show_options(struct seq_file *m, struct vfsmount *mnt)
+{
+	struct autofs_sb_info *sbi = autofs4_sbi(mnt->mnt_sb);
+
+	if (!sbi)
+		return 0;
+
+	seq_printf(m, ",fd=%d", sbi->pipefd);
+	seq_printf(m, ",pgrp=%d", sbi->oz_pgrp);
+	seq_printf(m, ",timeout=%lu", sbi->exp_timeout/HZ);
+	seq_printf(m, ",minproto=%d", sbi->min_proto);
+	seq_printf(m, ",maxproto=%d", sbi->max_proto);
+
+	return 0;
+}
+
 static struct super_operations autofs4_sops = {
 	.put_super	= autofs4_put_super,
 	.statfs		= simple_statfs,
+	.show_options	= autofs4_show_options,
 };
 
 enum {Opt_err, Opt_fd, Opt_uid, Opt_gid, Opt_pgrp, Opt_minproto, Opt_maxproto};
@@ -261,7 +279,6 @@ int autofs4_fill_super(struct super_bloc
 	int pipefd;
 	struct autofs_sb_info *sbi;
 	struct autofs_info *ino;
-	int minproto, maxproto;
 
 	sbi = (struct autofs_sb_info *) kmalloc(sizeof(*sbi), GFP_KERNEL);
 	if ( !sbi )
@@ -273,12 +290,15 @@ int autofs4_fill_super(struct super_bloc
 	s->s_fs_info = sbi;
 	sbi->magic = AUTOFS_SBI_MAGIC;
 	sbi->root = NULL;
+	sbi->pipefd = -1;
 	sbi->catatonic = 0;
 	sbi->exp_timeout = 0;
 	sbi->oz_pgrp = process_group(current);
 	sbi->sb = s;
 	sbi->version = 0;
 	sbi->sub_version = 0;
+	sbi->min_proto = 0;
+	sbi->max_proto = 0;
 	init_MUTEX(&sbi->wq_sem);
 	spin_lock_init(&sbi->fs_lock);
 	sbi->queues = NULL;
@@ -311,22 +331,26 @@ int autofs4_fill_super(struct super_bloc
 	if (parse_options(data, &pipefd,
 			  &root_inode->i_uid, &root_inode->i_gid,
 			  &sbi->oz_pgrp,
-			  &minproto, &maxproto)) {
+			  &sbi->min_proto, &sbi->max_proto)) {
 		printk("autofs: called with bogus options\n");
 		goto fail_dput;
 	}
 
 	/* Couldn't this be tested earlier? */
-	if (maxproto < AUTOFS_MIN_PROTO_VERSION ||
-	    minproto > AUTOFS_MAX_PROTO_VERSION) {
+	if (sbi->max_proto < AUTOFS_MIN_PROTO_VERSION ||
+	    sbi->min_proto > AUTOFS_MAX_PROTO_VERSION) {
 		printk("autofs: kernel does not match daemon version "
 		       "daemon (%d, %d) kernel (%d, %d)\n",
-			minproto, maxproto,
+			sbi->min_proto, sbi->max_proto,
 			AUTOFS_MIN_PROTO_VERSION, AUTOFS_MAX_PROTO_VERSION);
 		goto fail_dput;
 	}
 
-	sbi->version = maxproto > AUTOFS_MAX_PROTO_VERSION ? AUTOFS_MAX_PROTO_VERSION : maxproto;
+	/* Establish highest kernel protocol version */
+	if (sbi->max_proto > AUTOFS_MAX_PROTO_VERSION)
+		sbi->version = AUTOFS_MAX_PROTO_VERSION;
+	else
+		sbi->version = sbi->max_proto;
 	sbi->sub_version = AUTOFS_PROTO_SUBVERSION;
 
 	DPRINTK("pipe fd = %d, pgrp = %u", pipefd, sbi->oz_pgrp);
@@ -339,6 +363,7 @@ int autofs4_fill_super(struct super_bloc
 	if ( !pipe->f_op || !pipe->f_op->write )
 		goto fail_fput;
 	sbi->pipe = pipe;
+	sbi->pipefd = pipefd;
 
 	/*
 	 * Take a reference to the root dentry so we get a chance to
--- linux-2.6.15-mm2/fs/autofs4/autofs_i.h.add-show_options	2006-01-11 16:11:06.000000000 +0800
+++ linux-2.6.15-mm2/fs/autofs4/autofs_i.h	2006-01-11 16:26:19.000000000 +0800
@@ -86,11 +86,14 @@ struct autofs_wait_queue {
 struct autofs_sb_info {
 	u32 magic;
 	struct dentry *root;
+	int pipefd;
 	struct file *pipe;
 	pid_t oz_pgrp;
 	int catatonic;
 	int version;
 	int sub_version;
+	int min_proto;
+	int max_proto;
 	unsigned long exp_timeout;
 	int reghost_enabled;
 	int needs_reghost;

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-01-18  6:48 Ian Kent
  0 siblings, 0 replies; 211+ messages in thread
From: Ian Kent @ 2006-01-18  6:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel Mailing List, linux-fsdevel, autofs

This patch alters the expire semantics that define how "busyness"
is determined. Currently a last_used counter is updated on every
revalidate from processes other than the mount owner process group.

This patch changes that so that an expire candidate is busy only if it
has a reference count greater than the expected minimum, such as
when there is an open file or working directory in use.

This method is the only way that busyness can be established for
direct mounts within the new implementation. For consistency the
expire semantic is made the same for all mounts.

A side effect of the patch is that mounts which remain mounted
unessessarily in the presence of some GUI programs that scan the
filesystem should now expire.

Signed-off-by: Ian Kent <raven@themaw.net>


--- linux-2.6.15-mm3/fs/autofs4/expire.c.expire-not-busy-only	2006-01-13 19:11:26.000000000 +0800
+++ linux-2.6.15-mm3/fs/autofs4/expire.c	2006-01-13 19:16:47.000000000 +0800
@@ -47,6 +47,7 @@ static inline int autofs4_can_expire(str
 /* Check a mount point for busyness */
 static int autofs4_mount_busy(struct vfsmount *mnt, struct dentry *dentry)
 {
+	struct dentry *top = dentry;
 	int status = 1;
 
 	DPRINTK("dentry %p %.*s",
@@ -62,9 +63,14 @@ static int autofs4_mount_busy(struct vfs
 	if (is_autofs4_dentry(dentry))
 		goto done;
 
-	/* The big question */
-	if (may_umount_tree(mnt) == 0)
-		status = 0;
+	/* Update the expiry counter if fs is busy */
+	if (may_umount_tree(mnt)) {
+		struct autofs_info *ino = autofs4_dentry_ino(top);
+		ino->last_used = jiffies;
+		goto done;
+	}
+
+	status = 0;
 done:
 	DPRINTK("returning = %d", status);
 	mntput(mnt);
@@ -101,7 +107,7 @@ static int autofs4_tree_busy(struct vfsm
 			     unsigned long timeout,
 			     int do_now)
 {
-	struct autofs_info *ino;
+	struct autofs_info *top_ino = autofs4_dentry_ino(top);
 	struct dentry *p;
 
 	DPRINTK("top %p %.*s",
@@ -127,14 +133,16 @@ static int autofs4_tree_busy(struct vfsm
 		 * Is someone visiting anywhere in the subtree ?
 		 * If there's no mount we need to check the usage
 		 * count for the autofs dentry.
+		 * If the fs is busy update the expiry counter.
 		 */
-		ino = autofs4_dentry_ino(p);
 		if (d_mountpoint(p)) {
 			if (autofs4_mount_busy(mnt, p)) {
+				top_ino->last_used = jiffies;
 				dput(p);
 				return 1;
 			}
 		} else {
+			struct autofs_info *ino = autofs4_dentry_ino(p);
 			unsigned int ino_count = atomic_read(&ino->count);
 
 			/* allow for dget above and top is already dgot */
@@ -144,6 +152,7 @@ static int autofs4_tree_busy(struct vfsm
 				ino_count++;
 
 			if (atomic_read(&p->d_count) > ino_count) {
+				top_ino->last_used = jiffies;
 				dput(p);
 				return 1;
 			}
@@ -183,14 +192,13 @@ static struct dentry *autofs4_check_leav
 		spin_unlock(&dcache_lock);
 
 		if (d_mountpoint(p)) {
-			/* Can we expire this guy */
-			if (!autofs4_can_expire(p, timeout, do_now))
+			/* Can we umount this guy */
+			if (autofs4_mount_busy(mnt, p))
 				goto cont;
 
-			/* Can we umount this guy */
-			if (!autofs4_mount_busy(mnt, p))
+			/* Can we expire this guy */
+			if (autofs4_can_expire(p, timeout, do_now))
 				return p;
-
 		}
 cont:
 		dput(p);
@@ -246,12 +254,12 @@ static struct dentry *autofs4_expire(str
 			DPRINTK("checking mountpoint %p %.*s",
 				dentry, (int)dentry->d_name.len, dentry->d_name.name);
 
-			/* Can we expire this guy */
-			if (!autofs4_can_expire(dentry, timeout, do_now))
+			/* Can we umount this guy */
+			if (autofs4_mount_busy(mnt, dentry)) {
 				goto next;
 
-			/* Can we umount this guy */
-			if (!autofs4_mount_busy(mnt, dentry)) {
+			/* Can we expire this guy */
+			if (autofs4_can_expire(dentry, timeout, do_now))
 				expired = dentry;
 				break;
 			}
--- linux-2.6.15-mm3/fs/autofs4/root.c.expire-not-busy-only	2006-01-13 19:11:26.000000000 +0800
+++ linux-2.6.15-mm3/fs/autofs4/root.c	2006-01-13 19:11:26.000000000 +0800
@@ -330,6 +330,10 @@ static int try_to_fill_dentry(struct vfs
 	if (!autofs4_oz_mode(sbi))
 		autofs4_update_usage(mnt, dentry);
 
+	/* Initialize expiry counter after successful mount */
+	if (ino)
+		ino->last_used = jiffies;
+
 	spin_lock(&dentry->d_lock);
 	dentry->d_flags &= ~DCACHE_AUTOFS_PENDING;
 	spin_unlock(&dentry->d_lock);

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-01-18  6:48 Ian Kent
  0 siblings, 0 replies; 211+ messages in thread
From: Ian Kent @ 2006-01-18  6:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel Mailing List, linux-fsdevel, autofs

This patch fixes the case where an expire returns busy on a tree
mount when it is in fact not busy. This case was overlooked when
the patch to prevent the expiring away of "scaffolding" directories
for tree mounts was applied.

The problem arises when a tree of mounts is a member of a map
with other keys. The current logic will not expire the tree if
any other mount in the map is busy. The solution is to maintain
a "minimum" use count for each autofs dentry and compare this
to the actual dentry usage count during expire.

Signed-off-by: Ian Kent <raven@themaw.net>


--- linux-2.6.15-mm3/fs/autofs4/inode.c.expire-tree-false-negative	2006-01-13 16:05:10.000000000 +0800
+++ linux-2.6.15-mm3/fs/autofs4/inode.c	2006-01-13 16:17:08.000000000 +0800
@@ -46,6 +46,7 @@ struct autofs_info *autofs4_init_ino(str
 	ino->size = 0;
 
 	ino->last_used = jiffies;
+	atomic_set(&ino->count, 0);
 
 	ino->sbi = sbi;
 
@@ -64,10 +65,19 @@ struct autofs_info *autofs4_init_ino(str
 
 void autofs4_free_ino(struct autofs_info *ino)
 {
+	struct autofs_info *p_ino;
+
 	if (ino->dentry) {
 		ino->dentry->d_fsdata = NULL;
-		if (ino->dentry->d_inode)
+		if (ino->dentry->d_inode) {
+			struct dentry *parent = ino->dentry->d_parent;
+			if (atomic_dec_and_test(&ino->count)) {
+				p_ino = autofs4_dentry_ino(parent);
+				if (p_ino && parent != ino->dentry)
+					atomic_dec(&p_ino->count);
+			}
 			dput(ino->dentry);
+		}
 		ino->dentry = NULL;
 	}
 	if (ino->free)
--- linux-2.6.15-mm3/fs/autofs4/root.c.expire-tree-false-negative	2006-01-13 16:13:33.000000000 +0800
+++ linux-2.6.15-mm3/fs/autofs4/root.c	2006-01-13 16:17:08.000000000 +0800
@@ -490,6 +490,7 @@ static int autofs4_dir_symlink(struct in
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
+	struct autofs_info *p_ino;
 	struct inode *inode;
 	char *cp;
 
@@ -523,6 +524,10 @@ static int autofs4_dir_symlink(struct in
 
 	dentry->d_fsdata = ino;
 	ino->dentry = dget(dentry);
+	atomic_inc(&ino->count);
+	p_ino = autofs4_dentry_ino(dentry->d_parent);
+	if (p_ino && dentry->d_parent != dentry)
+		atomic_inc(&p_ino->count);
 	ino->inode = inode;
 
 	dir->i_mtime = CURRENT_TIME;
@@ -549,11 +554,17 @@ static int autofs4_dir_unlink(struct ino
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
+	struct autofs_info *p_ino;
 	
 	/* This allows root to remove symlinks */
 	if ( !autofs4_oz_mode(sbi) && !capable(CAP_SYS_ADMIN) )
 		return -EACCES;
 
+	if (atomic_dec_and_test(&ino->count)) {
+		p_ino = autofs4_dentry_ino(dentry->d_parent);
+		if (p_ino && dentry->d_parent != dentry)
+			atomic_dec(&p_ino->count);
+	}
 	dput(ino->dentry);
 
 	dentry->d_inode->i_size = 0;
@@ -570,6 +581,7 @@ static int autofs4_dir_rmdir(struct inod
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
+	struct autofs_info *p_ino;
 	
 	if (!autofs4_oz_mode(sbi))
 		return -EACCES;
@@ -584,8 +596,12 @@ static int autofs4_dir_rmdir(struct inod
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
+	if (atomic_dec_and_test(&ino->count)) {
+		p_ino = autofs4_dentry_ino(dentry->d_parent);
+		if (p_ino && dentry->d_parent != dentry)
+			atomic_dec(&p_ino->count);
+	}
 	dput(ino->dentry);
-
 	dentry->d_inode->i_size = 0;
 	dentry->d_inode->i_nlink = 0;
 
@@ -599,6 +615,7 @@ static int autofs4_dir_mkdir(struct inod
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
+	struct autofs_info *p_ino;
 	struct inode *inode;
 
 	if ( !autofs4_oz_mode(sbi) )
@@ -621,6 +638,10 @@ static int autofs4_dir_mkdir(struct inod
 
 	dentry->d_fsdata = ino;
 	ino->dentry = dget(dentry);
+	atomic_inc(&ino->count);
+	p_ino = autofs4_dentry_ino(dentry->d_parent);
+	if (p_ino && dentry->d_parent != dentry)
+		atomic_inc(&p_ino->count);
 	ino->inode = inode;
 	dir->i_nlink++;
 	dir->i_mtime = CURRENT_TIME;
--- linux-2.6.15-mm3/fs/autofs4/autofs_i.h.expire-tree-false-negative	2006-01-13 16:13:33.000000000 +0800
+++ linux-2.6.15-mm3/fs/autofs4/autofs_i.h	2006-01-13 16:17:08.000000000 +0800
@@ -54,6 +54,7 @@ struct autofs_info {
 
 	struct autofs_sb_info *sbi;
 	unsigned long last_used;
+	atomic_t count;
 
 	mode_t	mode;
 	size_t	size;
--- linux-2.6.15-mm3/fs/autofs4/expire.c.expire-tree-false-negative	2006-01-13 16:16:14.000000000 +0800
+++ linux-2.6.15-mm3/fs/autofs4/expire.c	2006-01-13 16:19:37.000000000 +0800
@@ -101,6 +101,7 @@ static int autofs4_tree_busy(struct vfsm
 			     unsigned long timeout,
 			     int do_now)
 {
+	struct autofs_info *ino;
 	struct dentry *p;
 
 	DPRINTK("top %p %.*s",
@@ -110,14 +111,6 @@ static int autofs4_tree_busy(struct vfsm
 	if (!simple_positive(top))
 		return 1;
 
-	/* Timeout of a tree mount is determined by its top dentry */
-	if (!autofs4_can_expire(top, timeout, do_now))
-		return 1;
-
-	/* Is someone visiting anywhere in the tree ? */
-	if (may_umount_tree(mnt))
-		return 1;
-
 	spin_lock(&dcache_lock);
 	for (p = top; p; p = next_dentry(p, top)) {
 		/* Negative dentry - give up */
@@ -130,17 +123,40 @@ static int autofs4_tree_busy(struct vfsm
 		p = dget(p);
 		spin_unlock(&dcache_lock);
 
+		/*
+		 * Is someone visiting anywhere in the subtree ?
+		 * If there's no mount we need to check the usage
+		 * count for the autofs dentry.
+		 */
+		ino = autofs4_dentry_ino(p);
 		if (d_mountpoint(p)) {
-			/* First busy => tree busy */
 			if (autofs4_mount_busy(mnt, p)) {
 				dput(p);
 				return 1;
 			}
+		} else {
+			unsigned int ino_count = atomic_read(&ino->count);
+
+			/* allow for dget above and top is already dgot */
+			if (p == top)
+				ino_count += 2;
+			else
+				ino_count++;
+
+			if (atomic_read(&p->d_count) > ino_count) {
+				dput(p);
+				return 1;
+			}
 		}
 		dput(p);
 		spin_lock(&dcache_lock);
 	}
 	spin_unlock(&dcache_lock);
+
+	/* Timeout of a tree mount is ultimately determined by its top dentry */
+	if (!autofs4_can_expire(top, timeout, do_now))
+		return 1;
+
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-01-18  6:48 Ian Kent
  0 siblings, 0 replies; 211+ messages in thread
From: Ian Kent @ 2006-01-18  6:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel Mailing List, linux-fsdevel, autofs

This patch simplifies the expire tree traversal code by using a
function from namespace.c to calculate the next entry in the top
down tree traversals carried out during the expire operation.

Signed-off-by: Ian Kent <raven@themaw.net>


--- linux-2.6.15-mm2/fs/autofs4/expire.c.expire-traversal-cleanup	2006-01-12 15:11:55.000000000 +0800
+++ linux-2.6.15-mm2/fs/autofs4/expire.c	2006-01-12 15:14:27.000000000 +0800
@@ -72,6 +72,27 @@ done:
 	return status;
 }
 
+/*
+ * Calculate next entry in top down tree traversal.
+ * From next_mnt in namespace.c - elegant.
+ */
+static struct dentry *next_dentry(struct dentry *p, struct dentry *root)
+{
+	struct list_head *next = p->d_subdirs.next;
+
+	if (next == &p->d_subdirs) {
+		while (1) {
+			if (p == root)
+				return NULL;
+			next = p->d_u.d_child.next;
+			if (next != &p->d_parent->d_subdirs)
+				break;
+			p = p->d_parent;
+		}
+	}
+	return list_entry(next, struct dentry, d_u.d_child);
+}
+
 /* Check a directory tree of mount points for busyness
  * The tree is not busy iff no mountpoints are busy
  */
@@ -80,8 +101,7 @@ static int autofs4_tree_busy(struct vfsm
 			     unsigned long timeout,
 			     int do_now)
 {
-	struct dentry *this_parent = top;
-	struct list_head *next;
+	struct dentry *p;
 
 	DPRINTK("top %p %.*s",
 		top, (int)top->d_name.len, top->d_name.name);
@@ -99,49 +119,28 @@ static int autofs4_tree_busy(struct vfsm
 		return 1;
 
 	spin_lock(&dcache_lock);
-repeat:
-	next = this_parent->d_subdirs.next;
-resume:
-	while (next != &this_parent->d_subdirs) {
-		struct dentry *dentry = list_entry(next, struct dentry, d_u.d_child);
-
+	for (p = top; p; p = next_dentry(p, top)) {
 		/* Negative dentry - give up */
-		if (!simple_positive(dentry)) {
-			next = next->next;
+		if (!simple_positive(p))
 			continue;
-		}
 
 		DPRINTK("dentry %p %.*s",
-			dentry, (int)dentry->d_name.len, dentry->d_name.name);
-
-		if (!simple_empty_nolock(dentry)) {
-			this_parent = dentry;
-			goto repeat;
-		}
+			p, (int) p->d_name.len, p->d_name.name);
 
-		dentry = dget(dentry);
+		p = dget(p);
 		spin_unlock(&dcache_lock);
 
-		if (d_mountpoint(dentry)) {
+		if (d_mountpoint(p)) {
 			/* First busy => tree busy */
-			if (autofs4_mount_busy(mnt, dentry)) {
-				dput(dentry);
+			if (autofs4_mount_busy(mnt, p)) {
+				dput(p);
 				return 1;
 			}
 		}
-
-		dput(dentry);
+		dput(p);
 		spin_lock(&dcache_lock);
-		next = next->next;
-	}
-
-	if (this_parent != top) {
-		next = this_parent->d_u.d_child.next;
-		this_parent = this_parent->d_parent;
-		goto resume;
 	}
 	spin_unlock(&dcache_lock);
-
 	return 0;
 }
 
@@ -150,59 +149,38 @@ static struct dentry *autofs4_check_leav
 					   unsigned long timeout,
 					   int do_now)
 {
-	struct dentry *this_parent = parent;
-	struct list_head *next;
+	struct dentry *p;
 
 	DPRINTK("parent %p %.*s",
 		parent, (int)parent->d_name.len, parent->d_name.name);
 
 	spin_lock(&dcache_lock);
-repeat:
-	next = this_parent->d_subdirs.next;
-resume:
-	while (next != &this_parent->d_subdirs) {
-		struct dentry *dentry = list_entry(next, struct dentry, d_u.d_child);
-
+	for (p = parent; p; p = next_dentry(p, parent)) {
 		/* Negative dentry - give up */
-		if (!simple_positive(dentry)) {
-			next = next->next;
+		if (!simple_positive(p))
 			continue;
-		}
 
 		DPRINTK("dentry %p %.*s",
-			dentry, (int)dentry->d_name.len, dentry->d_name.name);
+			p, (int) p->d_name.len, p->d_name.name);
 
-		if (!list_empty(&dentry->d_subdirs)) {
-			this_parent = dentry;
-			goto repeat;
-		}
-
-		dentry = dget(dentry);
+		p = dget(p);
 		spin_unlock(&dcache_lock);
 
-		if (d_mountpoint(dentry)) {
+		if (d_mountpoint(p)) {
 			/* Can we expire this guy */
-			if (!autofs4_can_expire(dentry, timeout, do_now))
+			if (!autofs4_can_expire(p, timeout, do_now))
 				goto cont;
 
 			/* Can we umount this guy */
-			if (!autofs4_mount_busy(mnt, dentry))
-				return dentry;
+			if (!autofs4_mount_busy(mnt, p))
+				return p;
 
 		}
 cont:
-		dput(dentry);
+		dput(p);
 		spin_lock(&dcache_lock);
-		next = next->next;
-	}
-
-	if (this_parent != parent) {
-		next = this_parent->d_u.d_child.next;
-		this_parent = this_parent->d_parent;
-		goto resume;
 	}
 	spin_unlock(&dcache_lock);
-
 	return NULL;
 }
 

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-01-18  6:48 Ian Kent
  0 siblings, 0 replies; 211+ messages in thread
From: Ian Kent @ 2006-01-18  6:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel Mailing List, linux-fsdevel, autofs

Whitespace and formating changes to lookup code.

Signed-off-by: Ian Kent <raven@themaw.net>


--- linux-2.6.16-rc1/fs/autofs4/root.c.lookup-cleanup	2006-01-18 09:20:22.000000000 +0800
+++ linux-2.6.16-rc1/fs/autofs4/root.c	2006-01-18 09:24:03.000000000 +0800
@@ -296,20 +296,20 @@ static int try_to_fill_dentry(struct vfs
 {
 	struct super_block *sb = mnt->mnt_sb;
 	struct autofs_sb_info *sbi = autofs4_sbi(sb);
-	struct autofs_info *de_info = autofs4_dentry_ino(dentry);
+	struct autofs_info *ino = autofs4_dentry_ino(dentry);
 	int status = 0;
 
 	/* Block on any pending expiry here; invalidate the dentry
            when expiration is done to trigger mount request with a new
            dentry */
-	if (de_info && (de_info->flags & AUTOFS_INF_EXPIRING)) {
+	if (ino && (ino->flags & AUTOFS_INF_EXPIRING)) {
 		DPRINTK("waiting for expire %p name=%.*s",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
 
 		status = autofs4_wait(sbi, dentry, NFY_NONE);
-		
+
 		DPRINTK("expire done status=%d", status);
-		
+
 		/*
 		 * If the directory still exists the mount request must
 		 * continue otherwise it can't be followed at the right
@@ -323,18 +323,21 @@ static int try_to_fill_dentry(struct vfs
 	DPRINTK("dentry=%p %.*s ino=%p",
 		 dentry, dentry->d_name.len, dentry->d_name.name, dentry->d_inode);
 
-	/* Wait for a pending mount, triggering one if there isn't one already */
+	/*
+	 * Wait for a pending mount, triggering one if there
+	 * isn't one already
+	 */
 	if (dentry->d_inode == NULL) {
 		DPRINTK("waiting for mount name=%.*s",
 			 dentry->d_name.len, dentry->d_name.name);
 
 		status = autofs4_wait(sbi, dentry, NFY_MOUNT);
-		 
+
 		DPRINTK("mount done status=%d", status);
 
 		if (status && dentry->d_inode)
 			return 0; /* Try to get the kernel to invalidate this dentry */
-		
+
 		/* Turn this into a real negative dentry? */
 		if (status == -ENOENT) {
 			dentry->d_time = jiffies + AUTOFS_NEGATIVE_TIMEOUT;
@@ -367,8 +370,10 @@ static int try_to_fill_dentry(struct vfs
 		}
 	}
 
-	/* We don't update the usages for the autofs daemon itself, this
-	   is necessary for recursive autofs mounts */
+	/*
+	 * We don't update the usages for the autofs daemon itself, this
+	 * is necessary for recursive autofs mounts
+	 */
 	if (!autofs4_oz_mode(sbi))
 		autofs4_update_usage(mnt, dentry);
 
@@ -384,9 +389,9 @@ static int try_to_fill_dentry(struct vfs
  * yet completely filled in, and revalidate has to delay such
  * lookups..
  */
-static int autofs4_revalidate(struct dentry * dentry, struct nameidata *nd)
+static int autofs4_revalidate(struct dentry *dentry, struct nameidata *nd)
 {
-	struct inode * dir = dentry->d_parent->d_inode;
+	struct inode *dir = dentry->d_parent->d_inode;
 	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
 	int oz_mode = autofs4_oz_mode(sbi);
 	int flags = nd ? nd->flags : 0;
@@ -462,12 +467,13 @@ static struct dentry *autofs4_lookup(str
 	DPRINTK("name = %.*s",
 		dentry->d_name.len, dentry->d_name.name);
 
+	/* File name too long to exist */
 	if (dentry->d_name.len > NAME_MAX)
-		return ERR_PTR(-ENAMETOOLONG);/* File name too long to exist */
+		return ERR_PTR(-ENAMETOOLONG);
 
 	sbi = autofs4_sbi(dir->i_sb);
-
 	oz_mode = autofs4_oz_mode(sbi);
+
 	DPRINTK("pid = %u, pgrp = %u, catatonic = %d, oz_mode = %d",
 		 current->pid, process_group(current), sbi->catatonic, oz_mode);
 
@@ -519,7 +525,7 @@ static struct dentry *autofs4_lookup(str
 	 * doesn't do the right thing for all system calls, but it should
 	 * be OK for the operations we permit from an autofs.
 	 */
-	if ( dentry->d_inode && d_unhashed(dentry) )
+	if (dentry->d_inode && d_unhashed(dentry))
 		return ERR_PTR(-ENOENT);
 
 	return NULL;

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2006-01-18  6:48 Ian Kent
  0 siblings, 0 replies; 211+ messages in thread
From: Ian Kent @ 2006-01-18  6:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel Mailing List, linux-fsdevel, autofs

This is the first set of patches for autofs in moving toward a
kernel <-> userspace communication protocol update.

The changes implemented in the patch set are:

1) lookup-cleanup - cleanup of whitespace and formating changes.
2) readdir-cleanup - changes readdir routines to use the cursor
   based routines in libfs.c.
3) failed-lookup - fix stale dentrys stopping mounts.
4) expire-cleanup - change return values and names of two functions
   to aid code readability.
5) expire-traversal-cleanup - simplify expire by adapting it to
   use the "next_entry" function from namespace.c.
6) expire-tree-false-negative - fix an expire case which returns
   busy on tree mounts when they are not.
7) expire-not-busy-only - alter expire semantics to match that
   needed for a rework of autofs direct mounts.
8) remove-update_atime - remove update of atime in favour of
   letting the VFS update it.
9) add-show_options - add show_options method to display autofs4
   mount options in the proc filesystem.
10) waitq-cleanup - whitespace cleanup of waitq.c
11) rename-simple_empty_nolock - rename function according to
   kernel conventions.
12) may_umount-to-boolean - change may_umount* functions to
  boolean to aid code readability.


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-11-07 22:34 jhlegqsiwnpek
  0 siblings, 0 replies; 211+ messages in thread
From: jhlegqsiwnpek @ 2005-11-07 22:34 UTC (permalink / raw)




^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-10-25  9:00 Miklos Szeredi
  0 siblings, 0 replies; 211+ messages in thread
From: Miklos Szeredi @ 2005-10-25  9:00 UTC (permalink / raw)
  To: akpm; +Cc: viro, linux-kernel, linux-fsdevel

Andrew,

can you please apply 1,4,5,6,7 (not 2,3,8) from the VFS+FUSE series?

Or should I resend the ones not rejected by Al?

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
       [not found]   ` <20050807102801.GA4141@infradead.org>
@ 2005-08-07 10:57     ` Miklos Szeredi
  0 siblings, 0 replies; 211+ messages in thread
From: Miklos Szeredi @ 2005-08-07 10:57 UTC (permalink / raw)
  To: hch; +Cc: linux-fsdevel, akpm

Christoph Hellwig wrote:
> I'd rather forbid binds to the foreign namespace, though.

Bind is a directional operation.  TO a foreign namespace is already
forbidden, FROM a foreign namespace it's not.

Is that logical?  Not too much, I agree.

Which is better?

  a) removing restrictions from bind

  b) adding more restrictions to bind

That's up for discussion.  I'd opt for a).  Usually it's better to
have less restrictions in the kernel, if it doesn't impact security.

For an implementation of a) (and more) see this patchset, based on
Mike Waychison's earlier work:

  http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111745909923350&w=2

Miklos

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
  2005-07-25 22:44 (unknown) Ram Pai
                   ` (5 preceding siblings ...)
  2005-07-25 22:44 ` (unknown) Ram Pai
@ 2005-07-25 22:44 ` Ram Pai
  6 siblings, 0 replies; 211+ messages in thread
From: Ram Pai @ 2005-07-25 22:44 UTC (permalink / raw)
  To: akpm, Al Viro; +Cc: Avantika Mathur, Mike Waychison

, miklos@szeredi.hu, Janak Desai <janak@us.ibm.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 7/7] shared subtree
Content-Type: text/x-patch; name=automount.patch
Content-Disposition: inline; filename=automount.patch

adds support for mount/umount propogation for autofs initiated operations,
RP

Signed by Ram Pai (linuxram@us.ibm.com)

 fs/namespace.c        |  176 +++++++++++++++++++-------------------------------
 fs/pnode.c            |   12 +--
 include/linux/pnode.h |    3 
 3 files changed, 76 insertions(+), 115 deletions(-)

Index: 2.6.12.work2/fs/namespace.c
===================================================================
--- 2.6.12.work2.orig/fs/namespace.c
+++ 2.6.12.work2/fs/namespace.c
@@ -202,6 +202,9 @@ struct vfsmount *do_attach_prepare_mnt(s
 		if(!(child_mnt = clone_mnt(template_mnt,
 				template_mnt->mnt_root)))
 			return NULL;
+		spin_lock(&vfsmount_lock);
+		list_del_init(&child_mnt->mnt_fslink);
+		spin_unlock(&vfsmount_lock);
 	} else
 		child_mnt = template_mnt;
 
@@ -355,35 +358,14 @@ struct seq_operations mounts_op = {
  */
 int may_umount_tree(struct vfsmount *mnt)
 {
-	struct list_head *next;
-	struct vfsmount *this_parent = mnt;
-	int actual_refs;
-	int minimum_refs;
+	int actual_refs=0;
+	int minimum_refs=0;
+	struct vfsmount *p;
 
 	spin_lock(&vfsmount_lock);
-	actual_refs = atomic_read(&mnt->mnt_count);
-	minimum_refs = 2;
-repeat:
-	next = this_parent->mnt_mounts.next;
-resume:
-	while (next != &this_parent->mnt_mounts) {
-		struct vfsmount *p = list_entry(next, struct vfsmount, mnt_child);
-
-		next = next->next;
-
+	for (p = mnt; p; p = next_mnt(p, mnt)) {
 		actual_refs += atomic_read(&p->mnt_count);
 		minimum_refs += 2;
-
-		if (!list_empty(&p->mnt_mounts)) {
-			this_parent = p;
-			goto repeat;
-		}
-	}
-
-	if (this_parent != mnt) {
-		next = this_parent->mnt_child.next;
-		this_parent = this_parent->mnt_parent;
-		goto resume;
 	}
 	spin_unlock(&vfsmount_lock);
 
@@ -395,18 +377,18 @@ resume:
 
 EXPORT_SYMBOL(may_umount_tree);
 
-int mount_busy(struct vfsmount *mnt)
+int mount_busy(struct vfsmount *mnt, int refcnt)
 {
 	struct vfspnode *parent_pnode;
 
 	if (mnt == mnt->mnt_parent || !IS_MNT_SHARED(mnt->mnt_parent))
-		return do_refcount_check(mnt, 2);
+		return do_refcount_check(mnt, refcnt);
 
 	parent_pnode = mnt->mnt_parent->mnt_pnode;
 	BUG_ON(!parent_pnode);
 	return pnode_mount_busy(parent_pnode,
 			mnt->mnt_mountpoint,
-			mnt->mnt_root, mnt);
+			mnt->mnt_root, mnt, refcnt);
 }
 
 /**
@@ -424,9 +406,12 @@ int mount_busy(struct vfsmount *mnt)
  */
 int may_umount(struct vfsmount *mnt)
 {
-	if (mount_busy(mnt))
-		return -EBUSY;
-	return 0;
+	int ret=0;
+	spin_lock(&vfsmount_lock);
+	if (mount_busy(mnt, 2))
+		ret = -EBUSY;
+	spin_unlock(&vfsmount_lock);
+	return ret;
 }
 
 EXPORT_SYMBOL(may_umount);
@@ -445,7 +430,26 @@ void do_detach_mount(struct vfsmount *mn
 	spin_lock(&vfsmount_lock);
 }
 
-void __umount_tree(struct vfsmount *mnt, int propogate)
+void umount_mnt(struct vfsmount *mnt, int propogate)
+{
+	if (propogate && mnt->mnt_parent != mnt &&
+		IS_MNT_SHARED(mnt->mnt_parent)) {
+		struct vfspnode *parent_pnode
+			= mnt->mnt_parent->mnt_pnode;
+		BUG_ON(!parent_pnode);
+		pnode_umount(parent_pnode,
+			mnt->mnt_mountpoint,
+			mnt->mnt_root);
+	} else {
+		if (IS_MNT_SHARED(mnt) || IS_MNT_SLAVE(mnt)) {
+			BUG_ON(!mnt->mnt_pnode);
+			pnode_disassociate_mnt(mnt);
+		}
+		do_detach_mount(mnt);
+	}
+}
+
+static void __umount_tree(struct vfsmount *mnt, int propogate)
 {
 	struct vfsmount *p;
 	LIST_HEAD(kill);
@@ -459,21 +463,7 @@ void __umount_tree(struct vfsmount *mnt,
 		mnt = list_entry(kill.next, struct vfsmount, mnt_list);
 		list_del_init(&mnt->mnt_list);
 		list_del_init(&mnt->mnt_fslink);
-		if (propogate && mnt->mnt_parent != mnt &&
-			IS_MNT_SHARED(mnt->mnt_parent)) {
-			struct vfspnode *parent_pnode
-				= mnt->mnt_parent->mnt_pnode;
-			BUG_ON(!parent_pnode);
-			pnode_umount(parent_pnode,
-				mnt->mnt_mountpoint,
-				mnt->mnt_root);
-		} else {
-			if (IS_MNT_SHARED(mnt) || IS_MNT_SLAVE(mnt)) {
-				BUG_ON(!mnt->mnt_pnode);
-				pnode_disassociate_mnt(mnt);
-			}
-			do_detach_mount(mnt);
-		}
+		umount_mnt(mnt, propogate);
 	}
 }
 
@@ -573,7 +563,7 @@ int do_umount(struct vfsmount *mnt, int 
 		spin_lock(&vfsmount_lock);
 	}
 	retval = -EBUSY;
-	if (flags & MNT_DETACH || !mount_busy(mnt)) {
+	if (flags & MNT_DETACH || !mount_busy(mnt, 2)) {
 		if (!list_empty(&mnt->mnt_list))
 			umount_tree(mnt);
 		retval = 0;
@@ -755,8 +745,11 @@ static void commit_attach_recursive_mnt(
 
 			if (slave_flag)
 				pnode_add_slave_pnode(master_pnode, tmp_pnode);
-			else
+			else {
+				spin_lock(&vfspnode_lock);
 				pnode_merge_pnode(tmp_pnode, master_pnode);
+				spin_unlock(&vfspnode_lock);
+			}
 
 			/*
 			 * we don't need the extra reference to
@@ -820,7 +813,6 @@ static void abort_attach_recursive_mnt(s
 	list_del_init(head);
 }
 
-
  /*
  *  @source_mnt : mount tree to be attached
  *  @nd		: place the mount tree @source_mnt is attached
@@ -1518,8 +1510,9 @@ static int do_move_mount(struct nameidat
 	detach_recursive_mnt(old_nd.mnt, &parent_nd);
 	spin_unlock(&vfsmount_lock);
 	if ((err = attach_recursive_mnt(old_nd.mnt, nd, 1))) {
+		spin_lock(&vfsmount_lock);
 		undo_detach_recursive_mnt(old_nd.mnt, &parent_nd);
-		goto out1;
+		goto out2;
 	}
 	spin_lock(&vfsmount_lock);
 	mntput(old_nd.mnt);
@@ -1621,6 +1614,8 @@ void mark_mounts_for_expiry(struct list_
 	if (list_empty(mounts))
 		return;
 
+	down_write(&namespace_sem);
+
 	spin_lock(&vfsmount_lock);
 
 	/* extract from the expiration list every vfsmount that matches the
@@ -1630,8 +1625,7 @@ void mark_mounts_for_expiry(struct list_
 	 *   cleared by mntput())
 	 */
 	list_for_each_entry_safe(mnt, next, mounts, mnt_fslink) {
-		if (!xchg(&mnt->mnt_expiry_mark, 1) ||
-		    atomic_read(&mnt->mnt_count) != 1)
+		if (!xchg(&mnt->mnt_expiry_mark, 1) || mount_busy(mnt, 1))
 			continue;
 
 		mntget(mnt);
@@ -1639,12 +1633,13 @@ void mark_mounts_for_expiry(struct list_
 	}
 
 	/*
-	 * go through the vfsmounts we've just consigned to the graveyard to
-	 * - check that they're still dead
+	 * go through the vfsmounts we've just consigned to the graveyard
 	 * - delete the vfsmount from the appropriate namespace under lock
 	 * - dispose of the corpse
 	 */
 	while (!list_empty(&graveyard)) {
+		struct super_block *sb;
+
 		mnt = list_entry(graveyard.next, struct vfsmount, mnt_fslink);
 		list_del_init(&mnt->mnt_fslink);
 
@@ -1655,60 +1650,25 @@ void mark_mounts_for_expiry(struct list_
 			continue;
 		get_namespace(namespace);
 
-		spin_unlock(&vfsmount_lock);
-		down_write(&namespace_sem);
-		spin_lock(&vfsmount_lock);
-
-		/* check that it is still dead: the count should now be 2 - as
-		 * contributed by the vfsmount parent and the mntget above */
-		if (atomic_read(&mnt->mnt_count) == 2) {
-			struct vfsmount *xdmnt;
-			struct dentry *xdentry;
-
-			/* delete from the namespace */
-			list_del_init(&mnt->mnt_list);
-			list_del_init(&mnt->mnt_child);
-			list_del_init(&mnt->mnt_hash);
-			mnt->mnt_mountpoint->d_mounted--;
-
-			xdentry = mnt->mnt_mountpoint;
-			mnt->mnt_mountpoint = mnt->mnt_root;
-			xdmnt = mnt->mnt_parent;
-			mnt->mnt_parent = mnt;
-
-			spin_unlock(&vfsmount_lock);
-
-			mntput(xdmnt);
-			dput(xdentry);
-
-			/* now lay it to rest if this was the last ref on the
-			 * superblock */
-			if (atomic_read(&mnt->mnt_sb->s_active) == 1) {
-				/* last instance - try to be smart */
-				lock_kernel();
-				DQUOT_OFF(mnt->mnt_sb);
-				acct_auto_close(mnt->mnt_sb);
-				unlock_kernel();
-			}
-
-			mntput(mnt);
-		} else {
-			/* someone brought it back to life whilst we didn't
-			 * have any locks held so return it to the expiration
-			 * list */
-			list_add_tail(&mnt->mnt_fslink, mounts);
-			spin_unlock(&vfsmount_lock);
+		sb = mnt->mnt_sb;
+		umount_mnt(mnt, 1);
+		/*
+		 * now lay it to rest if this was the last ref on the
+		 * superblock
+		 */
+		if (atomic_read(&sb->s_active) == 1) {
+			/* last instance - try to be smart */
+			lock_kernel();
+			DQUOT_OFF(sb);
+			acct_auto_close(sb);
+			unlock_kernel();
 		}
-
-		up_write(&namespace_sem);
-
 		mntput(mnt);
-		put_namespace(namespace);
 
-		spin_lock(&vfsmount_lock);
+		put_namespace(namespace);
 	}
-
 	spin_unlock(&vfsmount_lock);
+	up_write(&namespace_sem);
 }
 
 EXPORT_SYMBOL_GPL(mark_mounts_for_expiry);
@@ -2149,24 +2109,24 @@ asmlinkage long sys_pivot_root(const cha
 	detach_recursive_mnt(new_nd.mnt, &parent_nd);
 
 	spin_unlock(&vfsmount_lock);
- 	if ((error = attach_recursive_mnt(user_nd.mnt, &old_nd, 1))) {
+ 	if ((error = attach_recursive_mnt(new_nd.mnt, &root_parent, 1))) {
 		spin_lock(&vfsmount_lock);
 		undo_detach_recursive_mnt(new_nd.mnt, &parent_nd);
 		undo_detach_recursive_mnt(user_nd.mnt, &root_parent);
 		goto out3;
 	}
 	spin_lock(&vfsmount_lock);
- 	mntput(user_nd.mnt);
+ 	mntput(new_nd.mnt);
 
 	spin_unlock(&vfsmount_lock);
- 	if ((error = attach_recursive_mnt(new_nd.mnt, &root_parent, 1))) {
+ 	if ((error = attach_recursive_mnt(user_nd.mnt, &old_nd, 1))) {
 		spin_lock(&vfsmount_lock);
 		undo_detach_recursive_mnt(new_nd.mnt, &parent_nd);
 		undo_detach_recursive_mnt(user_nd.mnt, &root_parent);
 		goto out3;
 	}
 	spin_lock(&vfsmount_lock);
- 	mntput(new_nd.mnt);
+ 	mntput(user_nd.mnt);
 
 	spin_unlock(&vfsmount_lock);
 	chroot_fs_refs(&user_nd, &new_nd);
Index: 2.6.12.work2/fs/pnode.c
===================================================================
--- 2.6.12.work2.orig/fs/pnode.c
+++ 2.6.12.work2/fs/pnode.c
@@ -29,7 +29,7 @@
 static kmem_cache_t * pnode_cachep;
 
 /* spinlock for pnode related operations */
- __cacheline_aligned_in_smp DEFINE_SPINLOCK(vfspnode_lock);
+  __cacheline_aligned_in_smp DEFINE_SPINLOCK(vfspnode_lock);
 
 enum pnode_vfs_type {
 	PNODE_MEMBER_VFS = 0x01,
@@ -673,6 +673,7 @@ static int vfs_busy(struct vfsmount *mnt
 	struct dentry *dentry = va_arg(args, struct dentry *);
 	struct dentry *rootdentry = va_arg(args, struct dentry *);
 	struct vfsmount *origmnt = va_arg(args, struct vfsmount *);
+	int    refcnt = va_arg(args, int);
 	struct vfsmount *child_mnt;
 	int ret=0;
 
@@ -685,22 +686,21 @@ static int vfs_busy(struct vfsmount *mnt
 
 	if (list_empty(&child_mnt->mnt_mounts)) {
 		if (origmnt == child_mnt)
-			ret = do_refcount_check(child_mnt, 3);
+			ret = do_refcount_check(child_mnt, refcnt+1);
 		else
-			ret = do_refcount_check(child_mnt, 2);
+			ret = do_refcount_check(child_mnt, refcnt);
 	}
 	mntput(child_mnt);
 	return ret;
 }
 
 int pnode_mount_busy(struct vfspnode *pnode, struct dentry *mntpt,
-		struct dentry *root, struct vfsmount *mnt)
+		struct dentry *root, struct vfsmount *mnt, int refcnt)
 {
 	return pnode_traverse(pnode, NULL, NULL,
-			NULL, NULL, vfs_busy, mntpt, root, mnt);
+			NULL, NULL, vfs_busy, mntpt, root, mnt, refcnt);
 }
 
-
 int vfs_umount(struct vfsmount *mnt, enum pnode_vfs_type flag,
 		void *indata, va_list args)
 {
Index: 2.6.12.work2/include/linux/pnode.h
===================================================================
--- 2.6.12.work2.orig/include/linux/pnode.h
+++ 2.6.12.work2/include/linux/pnode.h
@@ -77,6 +77,7 @@ void pnode_add_member_mnt(struct vfspnod
 void pnode_del_slave_mnt(struct vfsmount *);
 void pnode_del_member_mnt(struct vfsmount *);
 void pnode_disassociate_mnt(struct vfsmount *);
+void pnode_member_to_slave(struct vfsmount *);
 void pnode_add_slave_pnode(struct vfspnode *, struct vfspnode *);
 struct vfsmount * pnode_make_mounted(struct vfspnode *, struct vfsmount *,
 		struct dentry *);
@@ -91,5 +92,5 @@ int pnode_commit_mount(struct vfspnode *
 int pnode_abort_mount(struct vfspnode *, struct vfsmount *);
 int pnode_umount(struct vfspnode *, struct dentry *, struct dentry *);
 int pnode_mount_busy(struct vfspnode *, struct dentry *, struct dentry *,
-		struct vfsmount *);
+		struct vfsmount *, int);
 #endif /* _LINUX_PNODE_H */

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
  2005-07-25 22:44 (unknown) Ram Pai
                   ` (4 preceding siblings ...)
  2005-07-25 22:44 ` (unknown) Ram Pai
@ 2005-07-25 22:44 ` Ram Pai
  2005-07-25 22:44 ` (unknown) Ram Pai
  6 siblings, 0 replies; 211+ messages in thread
From: Ram Pai @ 2005-07-25 22:44 UTC (permalink / raw)
  To: akpm, Al Viro; +Cc: Avantika Mathur, Mike Waychison

, miklos@szeredi.hu, Janak Desai <janak@us.ibm.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 6/7] shared subtree
Content-Type: text/x-patch; name=namespace.patch
Content-Disposition: inline; filename=namespace.patch

Adds ability to clone a namespace that has shared/private/slave/unclone
subtrees in it.

RP


Signed by Ram Pai (linuxram@us.ibm.com)

 fs/namespace.c |    9 +++++++++
 1 files changed, 9 insertions(+)

Index: 2.6.12-rc6.work1/fs/namespace.c
===================================================================
--- 2.6.12-rc6.work1.orig/fs/namespace.c
+++ 2.6.12-rc6.work1/fs/namespace.c
@@ -1894,6 +1894,13 @@ int copy_namespace(int flags, struct tas
 	q = new_ns->root;
 	while (p) {
 		q->mnt_namespace = new_ns;
+
+		if (IS_MNT_SHARED(q))
+			pnode_add_member_mnt(q->mnt_pnode, q);
+		else if (IS_MNT_SLAVE(q))
+			pnode_add_slave_mnt(q->mnt_pnode, q);
+		put_pnode(q->mnt_pnode);
+
 		if (fs) {
 			if (p == fs->rootmnt) {
 				rootmnt = p;
@@ -2271,6 +2278,8 @@ void __put_namespace(struct namespace *n
 	spin_lock(&vfsmount_lock);
 
 	list_for_each_entry(mnt, &namespace->list, mnt_list) {
+		if (mnt->mnt_pnode)
+			pnode_disassociate_mnt(mnt);
 		mnt->mnt_namespace = NULL;
 	}
 

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
  2005-07-25 22:44 (unknown) Ram Pai
                   ` (3 preceding siblings ...)
  2005-07-25 22:44 ` (unknown) Ram Pai
@ 2005-07-25 22:44 ` Ram Pai
  2005-07-25 22:44 ` (unknown) Ram Pai
  2005-07-25 22:44 ` (unknown) Ram Pai
  6 siblings, 0 replies; 211+ messages in thread
From: Ram Pai @ 2005-07-25 22:44 UTC (permalink / raw)
  To: akpm, Al Viro; +Cc: Avantika Mathur, Mike Waychison

, miklos@szeredi.hu, Janak Desai <janak@us.ibm.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 5/7] shared subtree
Content-Type: text/x-patch; name=umount.patch
Content-Disposition: inline; filename=umount.patch

Adds ability to unmount a shared/slave/unclone/private tree

RP

Signed by Ram Pai (linuxram@us.ibm.com)

 fs/namespace.c        |   76 ++++++++++++++++++++++++++++++++++++++++----------
 fs/pnode.c            |   66 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h    |    3 +
 include/linux/pnode.h |    9 ++++-
 4 files changed, 138 insertions(+), 16 deletions(-)

Index: 2.6.12.work2/fs/pnode.c
===================================================================
--- 2.6.12.work2.orig/fs/pnode.c
+++ 2.6.12.work2/fs/pnode.c
@@ -666,3 +666,69 @@ int pnode_abort_mount(struct vfspnode *p
 			NULL, (void *)NULL, NULL, NULL,
 			vfs_abort_mount_func, exception_mnt);
 }
+
+static int vfs_busy(struct vfsmount *mnt, enum pnode_vfs_type flag,
+		void *indata, va_list args)
+{
+	struct dentry *dentry = va_arg(args, struct dentry *);
+	struct dentry *rootdentry = va_arg(args, struct dentry *);
+	struct vfsmount *origmnt = va_arg(args, struct vfsmount *);
+	struct vfsmount *child_mnt;
+	int ret=0;
+
+	spin_unlock(&vfsmount_lock);
+	child_mnt = __lookup_mnt(mnt, dentry, rootdentry);
+	spin_lock(&vfsmount_lock);
+
+	if (!child_mnt)
+		return 0;
+
+	if (list_empty(&child_mnt->mnt_mounts)) {
+		if (origmnt == child_mnt)
+			ret = do_refcount_check(child_mnt, 3);
+		else
+			ret = do_refcount_check(child_mnt, 2);
+	}
+	mntput(child_mnt);
+	return ret;
+}
+
+int pnode_mount_busy(struct vfspnode *pnode, struct dentry *mntpt,
+		struct dentry *root, struct vfsmount *mnt)
+{
+	return pnode_traverse(pnode, NULL, NULL,
+			NULL, NULL, vfs_busy, mntpt, root, mnt);
+}
+
+
+int vfs_umount(struct vfsmount *mnt, enum pnode_vfs_type flag,
+		void *indata, va_list args)
+{
+	struct vfsmount *child_mnt;
+	struct dentry *dentry, *rootdentry;
+
+
+	dentry = va_arg(args, struct dentry *);
+	rootdentry = va_arg(args, struct dentry *);
+
+	spin_unlock(&vfsmount_lock);
+	child_mnt = __lookup_mnt(mnt, dentry, rootdentry);
+	spin_lock(&vfsmount_lock);
+	mntput(child_mnt);
+	if (child_mnt && list_empty(&child_mnt->mnt_mounts)) {
+		if (IS_MNT_SHARED(child_mnt) ||
+				IS_MNT_SLAVE(child_mnt)) {
+			BUG_ON(!child_mnt->mnt_pnode);
+			pnode_disassociate_mnt(child_mnt);
+		}
+		do_detach_mount(child_mnt);
+	}
+	return 0;
+}
+
+int pnode_umount(struct vfspnode *pnode, struct dentry *dentry,
+			struct dentry *rootdentry)
+{
+	return pnode_traverse(pnode, NULL, (void *)NULL,
+			NULL, NULL, vfs_umount, dentry, rootdentry);
+}
Index: 2.6.12.work2/fs/namespace.c
===================================================================
--- 2.6.12.work2.orig/fs/namespace.c
+++ 2.6.12.work2/fs/namespace.c
@@ -395,6 +395,20 @@ resume:
 
 EXPORT_SYMBOL(may_umount_tree);
 
+int mount_busy(struct vfsmount *mnt)
+{
+	struct vfspnode *parent_pnode;
+
+	if (mnt == mnt->mnt_parent || !IS_MNT_SHARED(mnt->mnt_parent))
+		return do_refcount_check(mnt, 2);
+
+	parent_pnode = mnt->mnt_parent->mnt_pnode;
+	BUG_ON(!parent_pnode);
+	return pnode_mount_busy(parent_pnode,
+			mnt->mnt_mountpoint,
+			mnt->mnt_root, mnt);
+}
+
 /**
  * may_umount - check if a mount point is busy
  * @mnt: root of mount
@@ -410,14 +424,28 @@ EXPORT_SYMBOL(may_umount_tree);
  */
 int may_umount(struct vfsmount *mnt)
 {
-	if (atomic_read(&mnt->mnt_count) > 2)
+	if (mount_busy(mnt))
 		return -EBUSY;
 	return 0;
 }
 
 EXPORT_SYMBOL(may_umount);
 
-void umount_tree(struct vfsmount *mnt)
+void do_detach_mount(struct vfsmount *mnt)
+{
+	struct nameidata old_nd;
+	if (mnt != mnt->mnt_parent) {
+		detach_mnt(mnt, &old_nd);
+		path_release(&old_nd);
+	}
+	list_del_init(&mnt->mnt_list);
+	list_del_init(&mnt->mnt_fslink);
+	spin_unlock(&vfsmount_lock);
+	mntput(mnt);
+	spin_lock(&vfsmount_lock);
+}
+
+void __umount_tree(struct vfsmount *mnt, int propogate)
 {
 	struct vfsmount *p;
 	LIST_HEAD(kill);
@@ -431,20 +459,40 @@ void umount_tree(struct vfsmount *mnt)
 		mnt = list_entry(kill.next, struct vfsmount, mnt_list);
 		list_del_init(&mnt->mnt_list);
 		list_del_init(&mnt->mnt_fslink);
-		if (mnt->mnt_parent == mnt) {
-			spin_unlock(&vfsmount_lock);
+		if (propogate && mnt->mnt_parent != mnt &&
+			IS_MNT_SHARED(mnt->mnt_parent)) {
+			struct vfspnode *parent_pnode
+				= mnt->mnt_parent->mnt_pnode;
+			BUG_ON(!parent_pnode);
+			pnode_umount(parent_pnode,
+				mnt->mnt_mountpoint,
+				mnt->mnt_root);
 		} else {
-			struct nameidata old_nd;
-			detach_mnt(mnt, &old_nd);
-			spin_unlock(&vfsmount_lock);
-			path_release(&old_nd);
+			if (IS_MNT_SHARED(mnt) || IS_MNT_SLAVE(mnt)) {
+				BUG_ON(!mnt->mnt_pnode);
+				pnode_disassociate_mnt(mnt);
+			}
+			do_detach_mount(mnt);
 		}
-		mntput(mnt);
-		spin_lock(&vfsmount_lock);
 	}
 }
 
-static int do_umount(struct vfsmount *mnt, int flags)
+void umount_tree(struct vfsmount *mnt)
+{
+	__umount_tree(mnt, 1);
+}
+
+/*
+ * return true if the refcount is greater than count
+ */
+int do_refcount_check(struct vfsmount *mnt, int count)
+{
+
+	int mycount = atomic_read(&mnt->mnt_count);
+	return (mycount > count);
+}
+
+int do_umount(struct vfsmount *mnt, int flags)
 {
 	struct super_block * sb = mnt->mnt_sb;
 	int retval;
@@ -525,7 +573,7 @@ static int do_umount(struct vfsmount *mn
 		spin_lock(&vfsmount_lock);
 	}
 	retval = -EBUSY;
-	if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) {
+	if (flags & MNT_DETACH || !mount_busy(mnt)) {
 		if (!list_empty(&mnt->mnt_list))
 			umount_tree(mnt);
 		retval = 0;
@@ -659,7 +707,7 @@ static struct vfsmount *copy_tree(struct
  Enomem:
 	if (res) {
 		spin_lock(&vfsmount_lock);
-		umount_tree(res);
+		__umount_tree(res, 0);
 		spin_unlock(&vfsmount_lock);
 	}
 	return NULL;
@@ -1341,7 +1389,7 @@ static int do_loopback(struct nameidata 
 		err = graft_tree(mnt, nd);
 		if (err) {
  			spin_lock(&vfsmount_lock);
- 			umount_tree(mnt);
+ 			__umount_tree(mnt, 0);
  			spin_unlock(&vfsmount_lock);
 			/*
 			 * ok we failed! so undo any overlay
Index: 2.6.12.work2/include/linux/fs.h
===================================================================
--- 2.6.12.work2.orig/include/linux/fs.h
+++ 2.6.12.work2/include/linux/fs.h
@@ -1216,12 +1216,15 @@ extern struct vfsmount *kern_mount(struc
 extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);
 extern long do_mount(char *, char *, char *, unsigned long, void *);
+extern int do_umount(struct vfsmount *, int);
 extern struct vfsmount *do_attach_prepare_mnt(struct vfsmount *,
 		struct dentry *, struct vfsmount *, int);
 extern void do_attach_commit_mnt(struct vfsmount *);
 extern struct vfsmount *do_make_mounted(struct vfsmount *, struct dentry *);
 extern int do_make_unmounted(struct vfsmount *);
 extern void do_detach_prepare_mnt(struct vfsmount *, int);
+extern void do_detach_mount(struct vfsmount *);
+extern int do_refcount_check(struct vfsmount *, int );
 
 extern int vfs_statfs(struct super_block *, struct kstatfs *);
 
Index: 2.6.12.work2/include/linux/pnode.h
===================================================================
--- 2.6.12.work2.orig/include/linux/pnode.h
+++ 2.6.12.work2/include/linux/pnode.h
@@ -63,13 +63,15 @@ put_pnode_locked(struct vfspnode *pnode)
 {
 	if (!pnode)
 		return;
-	if (atomic_dec_and_test(&pnode->pnode_count)) {
+	if (atomic_dec_and_test(&pnode->pnode_count))
 		__put_pnode(pnode);
-	}
 }
 
 void __init pnode_init(unsigned long );
 struct vfspnode * pnode_alloc(void);
+void pnode_free(struct vfspnode *);
+int pnode_is_busy(struct vfspnode *);
+int pnode_umount_vfs(struct vfspnode *, struct dentry *, struct dentry *, int);
 void pnode_add_slave_mnt(struct vfspnode *, struct vfsmount *);
 void pnode_add_member_mnt(struct vfspnode *, struct vfsmount *);
 void pnode_del_slave_mnt(struct vfsmount *);
@@ -87,4 +89,7 @@ int pnode_prepare_mount(struct vfspnode 
 		struct vfsmount *, struct vfsmount *);
 int pnode_commit_mount(struct vfspnode *, int);
 int pnode_abort_mount(struct vfspnode *, struct vfsmount *);
+int pnode_umount(struct vfspnode *, struct dentry *, struct dentry *);
+int pnode_mount_busy(struct vfspnode *, struct dentry *, struct dentry *,
+		struct vfsmount *);
 #endif /* _LINUX_PNODE_H */

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
  2005-07-25 22:44 (unknown) Ram Pai
                   ` (2 preceding siblings ...)
  2005-07-25 22:44 ` (unknown) Ram Pai
@ 2005-07-25 22:44 ` Ram Pai
  2005-07-25 22:44 ` (unknown) Ram Pai
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 211+ messages in thread
From: Ram Pai @ 2005-07-25 22:44 UTC (permalink / raw)
  To: akpm, Al Viro; +Cc: Avantika Mathur, Mike Waychison

, miklos@szeredi.hu, Janak Desai <janak@us.ibm.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 4/7] shared subtree
Content-Type: text/x-patch; name=move.patch
Content-Disposition: inline; filename=move.patch

Adds ability to move a shared/private/slave/unclone tree to any other
shared/private/slave/unclone tree. Also incorporates the same behavior
for pivot_root()

RP


Signed by Ram Pai (linuxram@us.ibm.com)

 fs/namespace.c        |  196 +++++++++++++++++++++++++++++++++++++++++++-------
 include/linux/mount.h |    2 
 2 files changed, 173 insertions(+), 25 deletions(-)

Index: 2.6.12.work2/fs/namespace.c
===================================================================
--- 2.6.12.work2.orig/fs/namespace.c
+++ 2.6.12.work2/fs/namespace.c
@@ -772,9 +772,12 @@ static void abort_attach_recursive_mnt(s
 	list_del_init(head);
 }
 
+
  /*
  *  @source_mnt : mount tree to be attached
  *  @nd		: place the mount tree @source_mnt is attached
+ *  @move	: use the move semantics if set, else use normal attach semantics
+ *                as explained below
  *
  *  NOTE: in the table below explains the semantics when a source vfsmount
  *  of a given type is attached to a destination vfsmount of a give type.
@@ -801,12 +804,41 @@ static void abort_attach_recursive_mnt(s
  *  |		|		|       	 |   	    |    	|
  *   ********************************************************************
  *
- * (++)  the mount will be propogated to all the vfsmounts in the pnode tree
+ * (++)  the mount is propogated to all the vfsmounts in the pnode tree
  *    	  of the destination vfsmount, and all the non-slave new mounts in
  *    	  destination vfsmount will be added the source vfsmount's pnode.
- * (+)  the mount will be propogated to the destination vfsmount
+ * (+)  the mount is propogated to the destination vfsmount
  *    	  and the new mount will be added to the source vfsmount's pnode.
  *
+ *  ---------------------------------------------------------------------
+ *  |				MOVE MOUNT OPERATION			|
+ *  |*******************************************************************|
+ *  |  dest --> | shared	|	private	 |  slave   |unclonable	|
+ *  | source	|		|       	 |   	    |    	|
+ *  |   |   	|		|       	 |   	    |    	|
+ *  |   v 	|		|       	 |   	    |    	|
+ *  |*******************************************************************|
+ *  |	     	|		|       	 |   	    |    	|
+ *  |  shared	| shared (++) 	|      shared (+)|shared (+)| shared (+)|
+ *  |		|		|       	 |   	    |    	|
+ *  |		|		|       	 |   	    |    	|
+ *  | private	| shared (+)	|      private	 | private  | private  	|
+ *  |		|		|       	 |   	    |    	|
+ *  |		|		|       	 |   	    |    	|
+ *  | slave	| shared (+++)	|      slave     | slave    | slave  	|
+ *  |		|		|       	 |   	    |    	|
+ *  |		|		|       	 |   	    |    	|
+ *  | unclonable|  invalid	|     unclonable |unclonable| unclonable|
+ *  |		|		|       	 |   	    |    	|
+ *  |		|		|       	 |   	    |    	|
+ *   ********************************************************************
+ *
+ * (+++)  the mount is propogated to all the vfsmounts in the pnode tree
+ *    	  of the destination vfsmount, and all the new mounts is
+ *    	  added to a new pnode , which is a slave pnode of the
+ *    	  source vfsmount's pnode.
+ *
+ *
  * if the source mount is a tree, the operations explained above is
  * applied to each vfsmount in the tree.
  *
@@ -815,7 +847,7 @@ static void abort_attach_recursive_mnt(s
  *
   */
 static int attach_recursive_mnt(struct vfsmount *source_mnt,
-		struct nameidata *nd)
+		struct nameidata *nd, int move)
 {
 	struct vfsmount *mntpt_mnt, *last, *m, *p;
 	struct vfspnode *src_pnode, *dest_pnode, *tmp_pnode;
@@ -849,8 +881,8 @@ static int attach_recursive_mnt(struct v
 	list_add_tail(&mnt_list_head, &source_mnt->mnt_list);
 
 	for (m = source_mnt; m; m = next_mnt(m, source_mnt)) {
-
-		BUG_ON(IS_MNT_UNCLONE(m));
+		int unclone = IS_MNT_UNCLONE(m);
+		int slave = IS_MNT_SLAVE(m);
 
 		while (p && p != m->mnt_parent)
 			p = p->mnt_parent;
@@ -866,7 +898,7 @@ static int attach_recursive_mnt(struct v
 
 		dest_pnode = IS_MNT_SHARED(mntpt_mnt) ?
 			mntpt_mnt->mnt_pnode : NULL;
-		src_pnode = (IS_MNT_SHARED(m))?
+		src_pnode = (IS_MNT_SHARED(m) || (move && slave))?
 				m->mnt_pnode : NULL;
 
 		/*
@@ -882,7 +914,7 @@ static int attach_recursive_mnt(struct v
 		list_del_init(&m->mnt_list);
 		list_add_tail(&tmp_pnode->pnode_peer_slave, &pnodehead);
 
-		if (dest_pnode) {
+		if (dest_pnode && !unclone) {
 			if ((ret = pnode_prepare_mount(dest_pnode, tmp_pnode,
 					mntpt_dentry, m, mntpt_mnt))) {
 				tmp_pnode->pnode_master = src_pnode;
@@ -890,23 +922,33 @@ static int attach_recursive_mnt(struct v
 				last = m;
 				goto error;
 			}
+			if (move && dest_pnode && slave)
+ 				SET_PNODE_SLAVE(tmp_pnode);
 		} else {
 			if (m == m->mnt_parent)
 				do_attach_prepare_mnt(mntpt_mnt,
 					mntpt_dentry, m, 0);
-			pnode_add_member_mnt(tmp_pnode, m);
-			if (!src_pnode) {
-				set_mnt_private(m);
+			if (move && slave)
+				pnode_add_slave_mnt(tmp_pnode, m);
+			else {
+				pnode_add_member_mnt(tmp_pnode, m);
+				if (unclone) {
+					BUG_ON(!move);
+					set_mnt_unclone(m);
+					m->mnt_pnode = tmp_pnode;
+					SET_PNODE_DELETE(tmp_pnode);
+				} else if (!src_pnode) {
+					set_mnt_private(m);
+					m->mnt_pnode = tmp_pnode;
+					SET_PNODE_DELETE(tmp_pnode);
+				}
 				/*
-				 * NOTE: set_mnt_private()
-				 * resets m->mnt_pnode.
-				 * Reinitialize it. This is needed to
-				 * decrement the refcount on the
-				 * pnode when the mount 'm' is
-				 * unlinked in pnode_commit_mount().
+				 * NOTE: set_mnt_private() & set_mnt_unclone()
+				 * resets m->mnt_pnode. Hence reinitialize it.
+				 * We need this to decrement the refcount
+				 * on the pnode when the mount 'm' is
+				 * unlinked in pnode_commit_mount()
 				 */
-				m->mnt_pnode = tmp_pnode;
-				SET_PNODE_DELETE(tmp_pnode);
 			}
 		}
 
@@ -931,6 +973,46 @@ error:
 	return 1;
 }
 
+static void
+detach_recursive_mnt(struct vfsmount *source_mnt, struct nameidata *nd)
+{
+	struct vfsmount *m;
+
+	detach_mnt(source_mnt, nd);
+	spin_lock(&vfspnode_lock);
+	for (m = source_mnt; m; m = next_mnt(m, source_mnt)) {
+		list_del_init(&m->mnt_pnode_mntlist);
+		list_del_init(&m->mnt_list);
+		if (m != source_mnt)
+			list_add_tail(&m->mnt_list, &source_mnt->mnt_list);
+	}
+	spin_unlock(&vfspnode_lock);
+}
+
+static void
+undo_detach_recursive_mnt(struct vfsmount *mnt, struct nameidata *nd)
+{
+	struct vfsmount *m;
+	LIST_HEAD(head);
+
+	spin_lock(&vfspnode_lock);
+	for (m = mnt; m; m = next_mnt(m, mnt)) {
+		if (m->mnt_pnode) {
+			if (IS_MNT_SHARED(m))
+				list_add(&m->mnt_pnode_mntlist,
+					&m->mnt_pnode->pnode_vfs);
+			if (IS_MNT_SLAVE(m))
+				list_add(&m->mnt_pnode_mntlist,
+					&m->mnt_pnode->pnode_slavevfs);
+		}
+	}
+	attach_mnt(mnt, nd);
+	spin_unlock(&vfspnode_lock);
+
+	list_add_tail(&head, &mnt->mnt_list);
+	list_splice(&head, nd->mnt->mnt_namespace->list.prev);
+}
+
 static int graft_tree(struct vfsmount *mnt, struct nameidata *nd)
 {
 	int err, ret;
@@ -957,7 +1039,7 @@ static int graft_tree(struct vfsmount *m
 	ret = (IS_ROOT(nd->dentry) || !d_unhashed(nd->dentry));
 	spin_unlock(&vfsmount_lock);
 	if (ret)
-		err = attach_recursive_mnt(mnt, nd);
+		err = attach_recursive_mnt(mnt, nd, 0);
 out_unlock:
 	up(&nd->dentry->d_inode->i_sem);
 	if (!err)
@@ -1311,6 +1393,19 @@ static int do_remount(struct nameidata *
 	return err;
 }
 
+/*
+ * return 1 if the mount tree contains a unclonable mount
+ */
+static inline int tree_contains_unclone(struct vfsmount *mnt)
+{
+	struct vfsmount *p;
+	for (p = mnt; p; p = next_mnt(p, mnt)) {
+		if (IS_MNT_UNCLONE(p))
+			return 1;
+	}
+	return 0;
+}
+
 static int do_move_mount(struct nameidata *nd, char *old_name)
 {
 	struct nameidata old_nd, parent_nd;
@@ -1351,14 +1446,35 @@ static int do_move_mount(struct nameidat
 	      S_ISDIR(old_nd.dentry->d_inode->i_mode))
 		goto out2;
 
+	/*
+	 * Don't move a mount in a shared parent.
+	 */
+	if (old_nd.mnt->mnt_parent &&
+		IS_MNT_SHARED(old_nd.mnt->mnt_parent))
+		goto out2;
+
+	/*
+	 * Don't move a mount tree having unclonable
+	 * mounts, under a shared mount
+	 */
+	if (IS_MNT_SHARED(nd->mnt) &&
+		tree_contains_unclone(old_nd.mnt))
+		goto out2;
+
 	err = -ELOOP;
 	for (p = nd->mnt; p->mnt_parent!=p; p = p->mnt_parent)
 		if (p == old_nd.mnt)
 			goto out2;
 	err = 0;
 
-	detach_mnt(old_nd.mnt, &parent_nd);
-	attach_mnt(old_nd.mnt, nd);
+	detach_recursive_mnt(old_nd.mnt, &parent_nd);
+	spin_unlock(&vfsmount_lock);
+	if ((err = attach_recursive_mnt(old_nd.mnt, nd, 1))) {
+		undo_detach_recursive_mnt(old_nd.mnt, &parent_nd);
+		goto out1;
+	}
+	spin_lock(&vfsmount_lock);
+	mntput(old_nd.mnt);
 
 	/* if the mount is moved, it should no longer be expire
 	 * automatically */
@@ -1949,6 +2065,16 @@ asmlinkage long sys_pivot_root(const cha
 		goto out2; /* not a mountpoint */
 	if (new_nd.mnt->mnt_root != new_nd.dentry)
 		goto out2; /* not a mountpoint */
+	/*
+	 * Don't move a mount in a shared parent.
+	 */
+	if(user_nd.mnt->mnt_parent &&
+		IS_MNT_SHARED(user_nd.mnt->mnt_parent))
+		goto out2;
+	if(new_nd.mnt->mnt_parent &&
+		IS_MNT_SHARED(new_nd.mnt->mnt_parent))
+		goto out2;
+
 	tmp = old_nd.mnt; /* make sure we can reach put_old from new_root */
 	spin_lock(&vfsmount_lock);
 	if (tmp != new_nd.mnt) {
@@ -1963,10 +2089,30 @@ asmlinkage long sys_pivot_root(const cha
 			goto out3;
 	} else if (!is_subdir(old_nd.dentry, new_nd.dentry))
 		goto out3;
-	detach_mnt(new_nd.mnt, &parent_nd);
-	detach_mnt(user_nd.mnt, &root_parent);
-	attach_mnt(user_nd.mnt, &old_nd);     /* mount old root on put_old */
-	attach_mnt(new_nd.mnt, &root_parent); /* mount new_root on / */
+
+	detach_recursive_mnt(user_nd.mnt, &root_parent);
+	detach_recursive_mnt(new_nd.mnt, &parent_nd);
+
+	spin_unlock(&vfsmount_lock);
+ 	if ((error = attach_recursive_mnt(user_nd.mnt, &old_nd, 1))) {
+		spin_lock(&vfsmount_lock);
+		undo_detach_recursive_mnt(new_nd.mnt, &parent_nd);
+		undo_detach_recursive_mnt(user_nd.mnt, &root_parent);
+		goto out3;
+	}
+	spin_lock(&vfsmount_lock);
+ 	mntput(user_nd.mnt);
+
+	spin_unlock(&vfsmount_lock);
+ 	if ((error = attach_recursive_mnt(new_nd.mnt, &root_parent, 1))) {
+		spin_lock(&vfsmount_lock);
+		undo_detach_recursive_mnt(new_nd.mnt, &parent_nd);
+		undo_detach_recursive_mnt(user_nd.mnt, &root_parent);
+		goto out3;
+	}
+	spin_lock(&vfsmount_lock);
+ 	mntput(new_nd.mnt);
+
 	spin_unlock(&vfsmount_lock);
 	chroot_fs_refs(&user_nd, &new_nd);
 	security_sb_post_pivotroot(&user_nd, &new_nd);
Index: 2.6.12.work2/include/linux/mount.h
===================================================================
--- 2.6.12.work2.orig/include/linux/mount.h
+++ 2.6.12.work2/include/linux/mount.h
@@ -29,6 +29,8 @@
 #define IS_MNT_SLAVE(mnt) (mnt->mnt_flags & MNT_SLAVE)
 #define IS_MNT_PRIVATE(mnt) (mnt->mnt_flags & MNT_PRIVATE)
 #define IS_MNT_UNCLONE(mnt) (mnt->mnt_flags & MNT_UNCLONE)
+#define GET_MNT_TYPE(mnt) (mnt->mnt_flags & MNT_PNODE_MASK)
+#define SET_MNT_TYPE(mnt, type) (mnt->mnt_flags |= (type & MNT_PNODE_MASK))
 
 #define CLEAR_MNT_SHARED(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_SHARED))
 #define CLEAR_MNT_PRIVATE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_PRIVATE))

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
  2005-07-25 22:44 (unknown) Ram Pai
  2005-07-25 22:44 ` (unknown) Ram Pai
  2005-07-25 22:44 ` (unknown) Ram Pai
@ 2005-07-25 22:44 ` Ram Pai
  2005-07-25 22:44 ` (unknown) Ram Pai
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 211+ messages in thread
From: Ram Pai @ 2005-07-25 22:44 UTC (permalink / raw)
  To: akpm, Al Viro; +Cc: Avantika Mathur, Mike Waychison

, miklos@szeredi.hu, Janak Desai <janak@us.ibm.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 3/7] shared subtree
Content-Type: text/x-patch; name=rbind.patch
Content-Disposition: inline; filename=rbind.patch

Adds the ability to bind/rbind a shared/private/slave subtree and set up
propogation wherever needed.

RP

Signed by Ram Pai (linuxram@us.ibm.com)

 fs/namespace.c            |  660 ++++++++++++++++++++++++++++++++++++++++------
 fs/pnode.c                |  235 ++++++++++++++++
 include/linux/dcache.h    |    2 
 include/linux/fs.h        |    5 
 include/linux/namespace.h |    1 
 5 files changed, 826 insertions(+), 77 deletions(-)

Index: 2.6.12.work2/fs/namespace.c
===================================================================
--- 2.6.12.work2.orig/fs/namespace.c
+++ 2.6.12.work2/fs/namespace.c
@@ -42,7 +42,8 @@ static inline int sysfs_init(void)
 
 static struct list_head *mount_hashtable;
 static int hash_mask, hash_bits;
-static kmem_cache_t *mnt_cache; 
+static kmem_cache_t *mnt_cache;
+static struct rw_semaphore namespace_sem;
 
 static inline unsigned long hash(struct vfsmount *mnt, struct dentry *dentry)
 {
@@ -54,7 +55,7 @@ static inline unsigned long hash(struct 
 
 struct vfsmount *alloc_vfsmnt(const char *name)
 {
-	struct vfsmount *mnt = kmem_cache_alloc(mnt_cache, GFP_KERNEL); 
+	struct vfsmount *mnt = kmem_cache_alloc(mnt_cache, GFP_KERNEL);
 	if (mnt) {
 		memset(mnt, 0, sizeof(struct vfsmount));
 		atomic_set(&mnt->mnt_count,1);
@@ -86,7 +87,8 @@ void free_vfsmnt(struct vfsmount *mnt)
  * Now, lookup_mnt increments the ref count before returning
  * the vfsmount struct.
  */
-struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
+struct vfsmount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry,
+		struct dentry *root)
 {
 	struct list_head * head = mount_hashtable + hash(mnt, dentry);
 	struct list_head * tmp = head;
@@ -99,7 +101,8 @@ struct vfsmount *lookup_mnt(struct vfsmo
 		if (tmp == head)
 			break;
 		p = list_entry(tmp, struct vfsmount, mnt_hash);
-		if (p->mnt_parent == mnt && p->mnt_mountpoint == dentry) {
+		if (p->mnt_parent == mnt && p->mnt_mountpoint == dentry &&
+				(root == NULL || p->mnt_root == root)) {
 			found = mntget(p);
 			break;
 		}
@@ -108,6 +111,37 @@ struct vfsmount *lookup_mnt(struct vfsmo
 	return found;
 }
 
+struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
+{
+	return __lookup_mnt(mnt, dentry, NULL);
+}
+
+static struct vfsmount *
+clone_mnt(struct vfsmount *old, struct dentry *root)
+{
+	struct super_block *sb = old->mnt_sb;
+	struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname);
+
+	if (mnt) {
+		mnt->mnt_flags = old->mnt_flags;
+		atomic_inc(&sb->s_active);
+		mnt->mnt_sb = sb;
+		mnt->mnt_root = dget(root);
+		mnt->mnt_mountpoint = mnt->mnt_root;
+		mnt->mnt_parent = mnt;
+		mnt->mnt_namespace = old->mnt_namespace;
+		mnt->mnt_pnode = get_pnode(old->mnt_pnode);
+
+		/* stick the duplicate mount on the same expiry list
+		 * as the original if that was on one */
+		spin_lock(&vfsmount_lock);
+		if (!list_empty(&old->mnt_fslink))
+			list_add(&mnt->mnt_fslink, &old->mnt_fslink);
+		spin_unlock(&vfsmount_lock);
+	}
+	return mnt;
+}
+
 static inline int check_mnt(struct vfsmount *mnt)
 {
 	return mnt->mnt_namespace == current->namespace;
@@ -128,11 +162,71 @@ static void attach_mnt(struct vfsmount *
 {
 	mnt->mnt_parent = mntget(nd->mnt);
 	mnt->mnt_mountpoint = dget(nd->dentry);
-	list_add(&mnt->mnt_hash, mount_hashtable+hash(nd->mnt, nd->dentry));
+	mnt->mnt_namespace = nd->mnt->mnt_namespace;
+	list_add_tail(&mnt->mnt_hash,
+			mount_hashtable+hash(nd->mnt, nd->dentry));
 	list_add_tail(&mnt->mnt_child, &nd->mnt->mnt_mounts);
 	nd->dentry->d_mounted++;
 }
 
+static void attach_prepare_mnt(struct vfsmount *mnt, struct nameidata *nd)
+{
+	mnt->mnt_parent = mntget(nd->mnt);
+	mnt->mnt_mountpoint = dget(nd->dentry);
+	nd->dentry->d_mounted++;
+}
+
+
+void do_attach_commit_mnt(struct vfsmount *mnt)
+{
+	struct vfsmount *parent = mnt->mnt_parent;
+	BUG_ON(parent==mnt);
+	if(list_empty(&mnt->mnt_hash))
+		list_add_tail(&mnt->mnt_hash,
+			mount_hashtable+hash(parent, mnt->mnt_mountpoint));
+	if(list_empty(&mnt->mnt_child))
+		list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
+	mnt->mnt_namespace = parent->mnt_namespace;
+	list_add_tail(&mnt->mnt_list, &mnt->mnt_namespace->list);
+}
+
+struct vfsmount *do_attach_prepare_mnt(struct vfsmount *mnt,
+		struct dentry *dentry,
+		struct vfsmount *template_mnt,
+		int clone_flag)
+{
+	struct vfsmount *child_mnt;
+	struct nameidata nd;
+
+	if (clone_flag) {
+		if(!(child_mnt = clone_mnt(template_mnt,
+				template_mnt->mnt_root)))
+			return NULL;
+	} else
+		child_mnt = template_mnt;
+
+	nd.mnt = mnt;
+	nd.dentry = dentry;
+
+	attach_prepare_mnt(child_mnt, &nd);
+
+	return child_mnt;
+}
+
+void do_detach_prepare_mnt(struct vfsmount *mnt, int free_flag)
+{
+	mnt->mnt_mountpoint->d_mounted--;
+	mntput(mnt->mnt_parent);
+	dput(mnt->mnt_mountpoint);
+	if (free_flag) {
+		BUG_ON(atomic_read(&mnt->mnt_count) != 1);
+		spin_lock(&vfsmount_lock);
+		list_del_init(&mnt->mnt_fslink);
+		spin_unlock(&vfsmount_lock);
+		mntput(mnt);
+	}
+}
+
 static struct vfsmount *next_mnt(struct vfsmount *p, struct vfsmount *root)
 {
 	struct list_head *next = p->mnt_mounts.next;
@@ -149,29 +243,14 @@ static struct vfsmount *next_mnt(struct 
 	return list_entry(next, struct vfsmount, mnt_child);
 }
 
-static struct vfsmount *
-clone_mnt(struct vfsmount *old, struct dentry *root)
+static struct vfsmount *skip_mnt_tree(struct vfsmount *p)
 {
-	struct super_block *sb = old->mnt_sb;
-	struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname);
-
-	if (mnt) {
-		mnt->mnt_flags = old->mnt_flags;
-		atomic_inc(&sb->s_active);
-		mnt->mnt_sb = sb;
-		mnt->mnt_root = dget(root);
-		mnt->mnt_mountpoint = mnt->mnt_root;
-		mnt->mnt_parent = mnt;
-		mnt->mnt_namespace = old->mnt_namespace;
-
-		/* stick the duplicate mount on the same expiry list
-		 * as the original if that was on one */
-		spin_lock(&vfsmount_lock);
-		if (!list_empty(&old->mnt_fslink))
-			list_add(&mnt->mnt_fslink, &old->mnt_fslink);
-		spin_unlock(&vfsmount_lock);
+	struct list_head *prev = p->mnt_mounts.prev;
+	while (prev != &p->mnt_mounts) {
+		p = list_entry(prev, struct vfsmount, mnt_child);
+		prev = p->mnt_mounts.prev;
 	}
-	return mnt;
+	return p;
 }
 
 void __mntput(struct vfsmount *mnt)
@@ -191,7 +270,7 @@ static void *m_start(struct seq_file *m,
 	struct list_head *p;
 	loff_t l = *pos;
 
-	down_read(&n->sem);
+	down_read(&namespace_sem);
 	list_for_each(p, &n->list)
 		if (!l--)
 			return list_entry(p, struct vfsmount, mnt_list);
@@ -208,8 +287,7 @@ static void *m_next(struct seq_file *m, 
 
 static void m_stop(struct seq_file *m, void *v)
 {
-	struct namespace *n = m->private;
-	up_read(&n->sem);
+	up_read(&namespace_sem);
 }
 
 static inline void mangle(struct seq_file *m, const char *s)
@@ -433,7 +511,7 @@ static int do_umount(struct vfsmount *mn
 		return retval;
 	}
 
-	down_write(&current->namespace->sem);
+	down_write(&namespace_sem);
 	spin_lock(&vfsmount_lock);
 
 	if (atomic_read(&sb->s_active) == 1) {
@@ -455,7 +533,7 @@ static int do_umount(struct vfsmount *mn
 	spin_unlock(&vfsmount_lock);
 	if (retval)
 		security_sb_umount_busy(mnt);
-	up_write(&current->namespace->sem);
+	up_write(&namespace_sem);
 	return retval;
 }
 
@@ -495,9 +573,9 @@ out:
 #ifdef __ARCH_WANT_SYS_OLDUMOUNT
 
 /*
- *	The 2.0 compatible umount. No flags. 
+ *	The 2.0 compatible umount. No flags.
  */
- 
+
 asmlinkage long sys_oldumount(char __user * name)
 {
 	return sys_umount(name,0);
@@ -541,6 +619,9 @@ static struct vfsmount *copy_tree(struct
 	struct list_head *h;
 	struct nameidata nd;
 
+	if (IS_MNT_UNCLONE(mnt))
+		return NULL;
+
 	res = q = clone_mnt(mnt, dentry);
 	if (!q)
 		goto Enomem;
@@ -549,10 +630,15 @@ static struct vfsmount *copy_tree(struct
 	p = mnt;
 	for (h = mnt->mnt_mounts.next; h != &mnt->mnt_mounts; h = h->next) {
 		r = list_entry(h, struct vfsmount, mnt_child);
+
 		if (!lives_below_in_same_fs(r->mnt_mountpoint, dentry))
 			continue;
 
 		for (s = r; s; s = next_mnt(s, r)) {
+			if (IS_MNT_UNCLONE(s)) {
+				s = skip_mnt_tree(s);
+				continue;
+			}
 			while (p != s->mnt_parent) {
 				p = p->mnt_parent;
 				q = q->mnt_parent;
@@ -579,9 +665,276 @@ static struct vfsmount *copy_tree(struct
 	return NULL;
 }
 
+/*
+ * return 1 if the mount tree contains a shared or slave mount
+ */
+static inline int tree_contains_sharedorslave(struct vfsmount *mnt)
+{
+	struct vfsmount *p;
+	for (p = mnt; p; p = next_mnt(p, mnt)) {
+		if (IS_MNT_SHARED(p) || IS_MNT_SLAVE(p))
+			return 1;
+	}
+	return 0;
+}
+
+/*
+ * commit the operations done in attach_recursive_mnt(). run through pnode list
+ * headed at 'pnodehead', and commit the operation done in
+ * attach_recursive_mnt();
+ */
+
+static void commit_attach_recursive_mnt(struct list_head *pnodehead)
+{
+	struct vfspnode *t_p, *tmp_pnode;
+
+	/*
+	 * Merge or delete or slave each of the temporary pnode
+	 */
+	spin_lock(&vfsmount_lock);
+	list_for_each_entry_safe(tmp_pnode, t_p, pnodehead,
+			pnode_peer_slave) {
+
+		int del_flag = IS_PNODE_DELETE(tmp_pnode);
+		int slave_flag = IS_PNODE_SLAVE(tmp_pnode);
+		struct vfspnode *master_pnode = tmp_pnode->pnode_master;
+
+		list_del_init(&tmp_pnode->pnode_peer_slave);
+		pnode_commit_mount(tmp_pnode, del_flag);
+
+		if (!del_flag && master_pnode) {
+			tmp_pnode->pnode_master = NULL;
+
+			if (slave_flag)
+				pnode_add_slave_pnode(master_pnode, tmp_pnode);
+			else
+				pnode_merge_pnode(tmp_pnode, master_pnode);
+
+			/*
+			 * we don't need the extra reference to
+			 * the master_pnode, which was created either
+			 * (a) pnode_add_slave_pnode: when the mnt
+			 * 	was made as a slave mnt.
+			 * (b) pnode_merge_pnode: during clone_mnt().
+			 */
+			put_pnode(master_pnode);
+		}
+	}
+	spin_unlock(&vfsmount_lock);
+}
+
+/*
+ * abort the operations done in attach_recursive_mnt(). run through the mount
+ * tree, till vfsmount 'last' and undo the changes.  Ensure that all the mounts
+ * in the tree are all back in the mnt_list headed at 'source_mnt'.
+ * NOTE: This function is closely tied to the logic in
+ * 'attach_recursive_mnt()'
+ */
+static void abort_attach_recursive_mnt(struct vfsmount *source_mnt, struct
+		vfsmount *last, struct list_head *head) { struct vfsmount *p =
+	source_mnt, *m; struct vfspnode *src_pnode;
+
+	if (!last)
+		return;
+
+	do {
+		int is_unclone, is_pnode_slave;
+
+		m = p;
+		is_unclone = IS_MNT_UNCLONE(m);
+
+		BUG_ON(!m->mnt_pnode);
+
+		is_pnode_slave = IS_PNODE_SLAVE(m->mnt_pnode);
+		src_pnode = m->mnt_pnode->pnode_master;
+		m->mnt_pnode->pnode_master = NULL;
+		pnode_abort_mount(m->mnt_pnode, m);
+
+		m->mnt_pnode = src_pnode;
+		if (src_pnode) {
+			if(is_pnode_slave)
+				set_mnt_slave(m);
+			else
+				set_mnt_shared(m);
+		} else {
+			if (is_unclone)
+				set_mnt_unclone(m);
+			else
+				set_mnt_private(m);
+		}
+
+
+		list_add_tail(&m->mnt_list, head);
+		p = next_mnt(m, source_mnt);
+
+	} while ( p && m != last );
+	source_mnt->mnt_parent = source_mnt;
+	list_del_init(head);
+}
+
+ /*
+ *  @source_mnt : mount tree to be attached
+ *  @nd		: place the mount tree @source_mnt is attached
+ *
+ *  NOTE: in the table below explains the semantics when a source vfsmount
+ *  of a given type is attached to a destination vfsmount of a give type.
+ *  ---------------------------------------------------------------------
+ *  |				BIND MOUNT OPERATION			|
+ *  |*******************************************************************|
+ *  |  dest --> | shared	|	private	 |  slave   |unclonable	|
+ *  | source	|		|       	 |   	    |    	|
+ *  |   |   	|		|       	 |   	    |    	|
+ *  |   v 	|		|       	 |   	    |    	|
+ *  |*******************************************************************|
+ *  |	     	|		|       	 |   	    |    	|
+ *  |  shared	| shared (++) 	|      shared (+)|shared (+)| shared (+)|
+ *  |		|		|       	 |   	    |    	|
+ *  |		|		|       	 |   	    |    	|
+ *  | private	| shared (+)	|      private	 | private  | private  	|
+ *  |		|		|       	 |   	    |    	|
+ *  |		|		|       	 |   	    |    	|
+ *  | slave	| shared (+)	|      private   | private  | private  	|
+ *  |		|		|       	 |   	    |    	|
+ *  |		|		|       	 |   	    |    	|
+ *  | unclonable|    nomount	|       nomount	 |  nomount | nomount 	|
+ *  |		|		|       	 |   	    |    	|
+ *  |		|		|       	 |   	    |    	|
+ *   ********************************************************************
+ *
+ * (++)  the mount will be propogated to all the vfsmounts in the pnode tree
+ *    	  of the destination vfsmount, and all the non-slave new mounts in
+ *    	  destination vfsmount will be added the source vfsmount's pnode.
+ * (+)  the mount will be propogated to the destination vfsmount
+ *    	  and the new mount will be added to the source vfsmount's pnode.
+ *
+ * if the source mount is a tree, the operations explained above is
+ * applied to each vfsmount in the tree.
+ *
+ * Should be called without spinlocks held, because this function can sleep
+ * in allocations.
+ *
+  */
+static int attach_recursive_mnt(struct vfsmount *source_mnt,
+		struct nameidata *nd)
+{
+	struct vfsmount *mntpt_mnt, *last, *m, *p;
+	struct vfspnode *src_pnode, *dest_pnode, *tmp_pnode;
+	struct dentry *mntpt_dentry;
+	int ret;
+	LIST_HEAD(pnodehead);
+	LIST_HEAD(mnt_list_head);
+
+	/*
+	 * if the source tree has no shared or slave mounts and
+	 * the destination mount is not shared, fastpath.
+	 */
+	mntpt_mnt = nd->mnt;
+	dest_pnode = IS_MNT_SHARED(mntpt_mnt) ? mntpt_mnt->mnt_pnode : NULL;
+	if (!dest_pnode && !tree_contains_sharedorslave(source_mnt)) {
+		spin_lock(&vfsmount_lock);
+		attach_mnt(source_mnt, nd);
+		list_add_tail(&mnt_list_head, &source_mnt->mnt_list);
+		list_splice(&mnt_list_head, mntpt_mnt->mnt_namespace->list.prev);
+		spin_unlock(&vfsmount_lock);
+		goto out;
+	}
+
+	/*
+	 * Create temporary pnodes which shall hold all the new
+	 * mounts. Merge or delete or slave that pnode later in a separate
+	 * operation, depending on the type of source and destination mounts.
+	 */
+	p = NULL;
+	last = NULL;
+	list_add_tail(&mnt_list_head, &source_mnt->mnt_list);
+
+	for (m = source_mnt; m; m = next_mnt(m, source_mnt)) {
+
+		BUG_ON(IS_MNT_UNCLONE(m));
+
+		while (p && p != m->mnt_parent)
+			p = p->mnt_parent;
+
+		if (!p) {
+			mntpt_dentry = nd->dentry;
+			mntpt_mnt = nd->mnt;
+		} else {
+			mntpt_dentry = m->mnt_mountpoint;
+			mntpt_mnt    = p;
+		}
+		p=m;
+
+		dest_pnode = IS_MNT_SHARED(mntpt_mnt) ?
+			mntpt_mnt->mnt_pnode : NULL;
+		src_pnode = (IS_MNT_SHARED(m))?
+				m->mnt_pnode : NULL;
+
+		/*
+		 * get a temporary pnode into which add the new vfs, and keep
+		 * track of these pnodes and their real pnode.
+		 */
+		if (!(tmp_pnode = pnode_alloc())) {
+			ret =  -ENOMEM;
+			goto error;
+		}
+
+		m->mnt_pnode = NULL;
+		list_del_init(&m->mnt_list);
+		list_add_tail(&tmp_pnode->pnode_peer_slave, &pnodehead);
+
+		if (dest_pnode) {
+			if ((ret = pnode_prepare_mount(dest_pnode, tmp_pnode,
+					mntpt_dentry, m, mntpt_mnt))) {
+				tmp_pnode->pnode_master = src_pnode;
+				m->mnt_pnode = tmp_pnode;
+				last = m;
+				goto error;
+			}
+		} else {
+			if (m == m->mnt_parent)
+				do_attach_prepare_mnt(mntpt_mnt,
+					mntpt_dentry, m, 0);
+			pnode_add_member_mnt(tmp_pnode, m);
+			if (!src_pnode) {
+				set_mnt_private(m);
+				/*
+				 * NOTE: set_mnt_private()
+				 * resets m->mnt_pnode.
+				 * Reinitialize it. This is needed to
+				 * decrement the refcount on the
+				 * pnode when the mount 'm' is
+				 * unlinked in pnode_commit_mount().
+				 */
+				m->mnt_pnode = tmp_pnode;
+				SET_PNODE_DELETE(tmp_pnode);
+			}
+		}
+
+		/*
+		 * temporarily track the pnode with which the tmp_pnode
+		 * has to merge with; in the pnode_master field.
+		 */
+		tmp_pnode->pnode_master = src_pnode;
+		last = m;
+	}
+	commit_attach_recursive_mnt(&pnodehead);
+out:
+	mntget(source_mnt);
+	return 0;
+error:
+	/*
+	 * ok we have errored out either because of memory exhaustion
+	 * or something else not in our control. Gracefully return
+	 * leaving no mess behind. Else it will haunt you. :(
+	 */
+	abort_attach_recursive_mnt(source_mnt, last, &mnt_list_head);
+	return 1;
+}
+
 static int graft_tree(struct vfsmount *mnt, struct nameidata *nd)
 {
-	int err;
+	int err, ret;
+
 	if (mnt->mnt_sb->s_flags & MS_NOUSER)
 		return -EINVAL;
 
@@ -599,17 +952,12 @@ static int graft_tree(struct vfsmount *m
 		goto out_unlock;
 
 	err = -ENOENT;
-	spin_lock(&vfsmount_lock);
-	if (IS_ROOT(nd->dentry) || !d_unhashed(nd->dentry)) {
-		struct list_head head;
 
-		attach_mnt(mnt, nd);
-		list_add_tail(&head, &mnt->mnt_list);
-		list_splice(&head, current->namespace->list.prev);
-		mntget(mnt);
-		err = 0;
-	}
+	spin_lock(&vfsmount_lock);
+	ret = (IS_ROOT(nd->dentry) || !d_unhashed(nd->dentry));
 	spin_unlock(&vfsmount_lock);
+	if (ret)
+		err = attach_recursive_mnt(mnt, nd);
 out_unlock:
 	up(&nd->dentry->d_inode->i_sem);
 	if (!err)
@@ -681,6 +1029,147 @@ static int do_make_unclone(struct vfsmou
 	return 0;
 }
 
+ /*
+ * This operation is equivalent of mount --bind dir dir
+ * create a new mount at the dentry, and unmount all child mounts
+ * mounted on top of dentries below 'dentry', and mount them
+ * under the new mount.
+  */
+struct vfsmount *do_make_mounted(struct vfsmount *mnt, struct dentry *dentry)
+{
+	struct vfsmount *child_mnt, *next;
+	struct nameidata nd;
+	struct vfsmount *newmnt = clone_mnt(mnt, dentry);
+	LIST_HEAD(head);
+
+	/*
+	 * note clone_mnt() gets a reference to the pnode.
+	 * we won't use that pnode anyway. So just let it
+	 * go
+	 */
+	put_pnode(newmnt->mnt_pnode);
+	newmnt->mnt_pnode = NULL;
+
+	if (newmnt) {
+		/*
+		 * walk through the mount list of mnt and move
+		 * them under the new mount
+		 */
+		spin_lock(&vfsmount_lock);
+		list_del_init(&newmnt->mnt_fslink);
+
+		list_for_each_entry_safe(child_mnt, next,
+				&mnt->mnt_mounts, mnt_child) {
+
+			if(child_mnt->mnt_mountpoint == dentry)
+				continue;
+
+			if(!is_subdir(child_mnt->mnt_mountpoint, dentry))
+				continue;
+
+			detach_mnt(child_mnt, &nd);
+			nd.mnt = newmnt;
+			attach_mnt(child_mnt, &nd);
+		}
+
+		nd.mnt = mnt;
+		nd.dentry = dentry;
+		attach_mnt(newmnt, &nd);
+		list_add_tail(&newmnt->mnt_list, &newmnt->mnt_namespace->list);
+ 		spin_unlock(&vfsmount_lock);
+ 	}
+	return newmnt;
+}
+
+ /*
+ * Inverse operation of do_make_mounted()
+  */
+int do_make_unmounted(struct vfsmount *mnt)
+{
+	struct vfsmount *parent_mnt, *child_mnt, *next;
+	struct nameidata nd;
+
+	/* validate if mount has a different parent */
+	parent_mnt = mnt->mnt_parent;
+	if (mnt == parent_mnt)
+		return 0;
+	/*
+	 * cannot unmount a mount that is not created
+	 * as a overlay mount.
+	 */
+	if (mnt->mnt_mountpoint != mnt->mnt_root)
+		return -EINVAL;
+
+	/* for each submounts in the parent, put the mounts back */
+	spin_lock(&vfsmount_lock);
+	list_for_each_entry_safe(child_mnt, next, &mnt->mnt_mounts, mnt_child) {
+		detach_mnt(child_mnt, &nd);
+		nd.mnt = parent_mnt;
+		attach_mnt(child_mnt, &nd);
+ 	}
+	detach_mnt(mnt, &nd);
+ 	spin_unlock(&vfsmount_lock);
+	return 0;
+}
+
+/*
+ * @nd: contains the vfsmount and the dentry where the new mount
+ * 	is the be created
+ * @mnt: returns the newly created mount.
+ * Create a new mount at the location specified by 'nd' and
+ * propogate the mount to all other mounts if the mountpoint
+ * is under a shared mount.
+ */
+int make_mounted(struct nameidata *nd, struct vfsmount **mnt)
+{
+	struct vfsmount *parent_mnt;
+	struct dentry *parent_dentry;
+	int err = mount_is_safe(nd);
+	if (err)
+		return err;
+	parent_dentry = nd->dentry;
+	parent_mnt = nd->mnt;
+ 	/*
+	 * check if dentry already has a vfsmount
+	 * if it does not, create and attach
+	 * a new vfsmount at that dentry.
+	 * Also propogate the mount if parent_mnt
+	 * is shared.
+ 	 */
+	if(parent_dentry != parent_mnt->mnt_root) {
+		*mnt = IS_MNT_SHARED(parent_mnt) ?
+			 pnode_make_mounted(parent_mnt->mnt_pnode,
+					 parent_mnt, parent_dentry) :
+			 do_make_mounted(parent_mnt, parent_dentry);
+		if (!*mnt)
+			err = -ENOMEM;
+ 	} else
+		*mnt = parent_mnt;
+	return err;
+}
+
+ /*
+ * Inverse operation of make_mounted()
+  */
+int make_unmounted(struct vfsmount *mnt)
+{
+	if (mnt == mnt->mnt_parent)
+		return 0;
+	/*
+	 * cannot unmount a mount that is not created
+	 * as a overlay mount.
+	 */
+	if (mnt->mnt_mountpoint != mnt->mnt_root)
+		return -EINVAL;
+
+	if (IS_MNT_SHARED(mnt))
+		pnode_make_unmounted(mnt->mnt_pnode);
+ 	else
+		do_make_unmounted(mnt);
+
+	return 0;
+}
+
 /*
  * recursively change the type of the mountpoint.
  */
@@ -724,7 +1213,7 @@ static int do_change_type(struct nameida
 static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
 {
 	struct nameidata old_nd;
-	struct vfsmount *mnt = NULL;
+	struct vfsmount *mnt = NULL, *overlay_mnt=NULL;
 	int err = mount_is_safe(nd);
 	if (err)
 		return err;
@@ -734,14 +1223,31 @@ static int do_loopback(struct nameidata 
 	if (err)
 		return err;
 
-	down_write(&current->namespace->sem);
+	if (IS_MNT_UNCLONE(old_nd.mnt)) {
+		err = -EINVAL;
+		goto path_release;
+	}
+
+	down_write(&namespace_sem);
 	err = -EINVAL;
 	if (check_mnt(nd->mnt) && (!recurse || check_mnt(old_nd.mnt))) {
+
+		/*
+		 * If the dentry is not the root dentry, and if a bind
+		 * from a shared subtree is attempted, create a mount
+		 * at the dentry, and use the new mount as the starting
+		 * point for the bind/rbind operation.
+		 */
+		overlay_mnt = old_nd.mnt;
+		if(IS_MNT_SHARED(old_nd.mnt) &&
+			(err = make_mounted(&old_nd, &overlay_mnt)))
+			goto out;
+
 		err = -ENOMEM;
 		if (recurse)
-			mnt = copy_tree(old_nd.mnt, old_nd.dentry);
+			mnt = copy_tree(overlay_mnt, old_nd.dentry);
 		else
-			mnt = clone_mnt(old_nd.mnt, old_nd.dentry);
+			mnt = clone_mnt(overlay_mnt, old_nd.dentry);
 	}
 
 	if (mnt) {
@@ -752,15 +1258,25 @@ static int do_loopback(struct nameidata 
 
 		err = graft_tree(mnt, nd);
 		if (err) {
-			spin_lock(&vfsmount_lock);
-			umount_tree(mnt);
-			spin_unlock(&vfsmount_lock);
-		} else
-			mntput(mnt);
-	}
+ 			spin_lock(&vfsmount_lock);
+ 			umount_tree(mnt);
+ 			spin_unlock(&vfsmount_lock);
+			/*
+			 * ok we failed! so undo any overlay
+			 * mount that we did earlier.
+			 */
+			if (old_nd.mnt !=  overlay_mnt)
+				make_unmounted(overlay_mnt);
+ 		} else
+ 			mntput(mnt);
+ 	}
+
+ out:
+	up_write(&namespace_sem);
+
+ path_release:
+ 	path_release(&old_nd);
 
-	up_write(&current->namespace->sem);
-	path_release(&old_nd);
 	return err;
 }
 
@@ -808,7 +1324,7 @@ static int do_move_mount(struct nameidat
 	if (err)
 		return err;
 
-	down_write(&current->namespace->sem);
+	down_write(&namespace_sem);
 	while(d_mountpoint(nd->dentry) && follow_down(&nd->mnt, &nd->dentry))
 		;
 	err = -EINVAL;
@@ -852,7 +1368,7 @@ out2:
 out1:
 	up(&nd->dentry->d_inode->i_sem);
 out:
-	up_write(&current->namespace->sem);
+	up_write(&namespace_sem);
 	if (!err)
 		path_release(&parent_nd);
 	path_release(&old_nd);
@@ -891,7 +1407,7 @@ int do_add_mount(struct vfsmount *newmnt
 {
 	int err;
 
-	down_write(&current->namespace->sem);
+	down_write(&namespace_sem);
 	/* Something was mounted here while we slept */
 	while(d_mountpoint(nd->dentry) && follow_down(&nd->mnt, &nd->dentry))
 		;
@@ -920,7 +1436,7 @@ int do_add_mount(struct vfsmount *newmnt
 	}
 
 unlock:
-	up_write(&current->namespace->sem);
+	up_write(&namespace_sem);
 	mntput(newmnt);
 	return err;
 }
@@ -976,7 +1492,7 @@ void mark_mounts_for_expiry(struct list_
 		get_namespace(namespace);
 
 		spin_unlock(&vfsmount_lock);
-		down_write(&namespace->sem);
+		down_write(&namespace_sem);
 		spin_lock(&vfsmount_lock);
 
 		/* check that it is still dead: the count should now be 2 - as
@@ -1020,7 +1536,7 @@ void mark_mounts_for_expiry(struct list_
 			spin_unlock(&vfsmount_lock);
 		}
 
-		up_write(&namespace->sem);
+		up_write(&namespace_sem);
 
 		mntput(mnt);
 		put_namespace(namespace);
@@ -1066,7 +1582,7 @@ int copy_mount_options(const void __user
 	int i;
 	unsigned long page;
 	unsigned long size;
-	
+
 	*where = 0;
 	if (!data)
 		return 0;
@@ -1085,7 +1601,7 @@ int copy_mount_options(const void __user
 
 	i = size - exact_copy_from_user((void *)page, data, size);
 	if (!i) {
-		free_page(page); 
+		free_page(page);
 		return -EFAULT;
 	}
 	if (i != PAGE_SIZE)
@@ -1191,14 +1707,13 @@ int copy_namespace(int flags, struct tas
 		goto out;
 
 	atomic_set(&new_ns->count, 1);
-	init_rwsem(&new_ns->sem);
 	INIT_LIST_HEAD(&new_ns->list);
 
-	down_write(&tsk->namespace->sem);
+	down_write(&namespace_sem);
 	/* First pass: copy the tree topology */
 	new_ns->root = copy_tree(namespace->root, namespace->root->mnt_root);
 	if (!new_ns->root) {
-		up_write(&tsk->namespace->sem);
+		up_write(&namespace_sem);
 		kfree(new_ns);
 		goto out;
 	}
@@ -1232,7 +1747,7 @@ int copy_namespace(int flags, struct tas
 		p = next_mnt(p, namespace->root);
 		q = next_mnt(q, new_ns->root);
 	}
-	up_write(&tsk->namespace->sem);
+	up_write(&namespace_sem);
 
 	tsk->namespace = new_ns;
 
@@ -1414,7 +1929,7 @@ asmlinkage long sys_pivot_root(const cha
 	user_nd.mnt = mntget(current->fs->rootmnt);
 	user_nd.dentry = dget(current->fs->root);
 	read_unlock(&current->fs->lock);
-	down_write(&current->namespace->sem);
+	down_write(&namespace_sem);
 	down(&old_nd.dentry->d_inode->i_sem);
 	error = -EINVAL;
 	if (!check_mnt(user_nd.mnt))
@@ -1460,7 +1975,7 @@ asmlinkage long sys_pivot_root(const cha
 	path_release(&parent_nd);
 out2:
 	up(&old_nd.dentry->d_inode->i_sem);
-	up_write(&current->namespace->sem);
+	up_write(&namespace_sem);
 	path_release(&user_nd);
 	path_release(&old_nd);
 out1:
@@ -1487,7 +2002,6 @@ static void __init init_mount_tree(void)
 		panic("Can't allocate initial namespace");
 	atomic_set(&namespace->count, 1);
 	INIT_LIST_HEAD(&namespace->list);
-	init_rwsem(&namespace->sem);
 	list_add(&mnt->mnt_list, &namespace->list);
 	namespace->root = mnt;
 	mnt->mnt_namespace = namespace;
@@ -1510,6 +2024,8 @@ void __init mnt_init(unsigned long mempa
 	unsigned int nr_hash;
 	int i;
 
+	init_rwsem(&namespace_sem);
+
 	mnt_cache = kmem_cache_create("mnt_cache", sizeof(struct vfsmount),
 			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL);
 
@@ -1557,7 +2073,7 @@ void __put_namespace(struct namespace *n
 {
 	struct vfsmount *mnt;
 
-	down_write(&namespace->sem);
+	down_write(&namespace_sem);
 	spin_lock(&vfsmount_lock);
 
 	list_for_each_entry(mnt, &namespace->list, mnt_list) {
@@ -1566,6 +2082,6 @@ void __put_namespace(struct namespace *n
 
 	umount_tree(namespace->root);
 	spin_unlock(&vfsmount_lock);
-	up_write(&namespace->sem);
+	up_write(&namespace_sem);
 	kfree(namespace);
 }
Index: 2.6.12.work2/fs/pnode.c
===================================================================
--- 2.6.12.work2.orig/fs/pnode.c
+++ 2.6.12.work2/fs/pnode.c
@@ -26,7 +26,6 @@
 #include <asm/unistd.h>
 #include <stdarg.h>
 
-
 static kmem_cache_t * pnode_cachep;
 
 /* spinlock for pnode related operations */
@@ -90,10 +89,12 @@ static void inline pnode_add_mnt(struct 
 	mnt->mnt_pnode = pnode;
 	if (slave) {
 		set_mnt_slave(mnt);
-		list_add(&mnt->mnt_pnode_mntlist, &pnode->pnode_slavevfs);
+		list_add(&mnt->mnt_pnode_mntlist,
+				&pnode->pnode_slavevfs);
 	} else {
 		set_mnt_shared(mnt);
-		list_add(&mnt->mnt_pnode_mntlist, &pnode->pnode_vfs);
+		list_add(&mnt->mnt_pnode_mntlist,
+				&pnode->pnode_vfs);
 	}
 	get_pnode(pnode);
 	spin_unlock(&vfspnode_lock);
@@ -111,7 +112,6 @@ void pnode_add_slave_mnt(struct vfspnode
 	pnode_add_mnt(pnode, mnt, 1);
 }
 
-
 void pnode_add_slave_pnode(struct vfspnode *pnode,
 		struct vfspnode *slave_pnode)
 {
@@ -439,3 +439,230 @@ error:
 	pnode_end(&context);
 	goto out;
 }
+
+int pnode_mount_func(struct vfspnode *pnode, void *indata,
+		void **outdata, va_list args)
+{
+	struct vfspnode *pnode_slave, *pnode_master;
+	int ret=0;
+
+	pnode_master = indata;
+
+	if (*outdata)
+		pnode_slave = *outdata;
+	else if (!(pnode_slave = pnode_alloc()))
+		return -ENOMEM;
+
+	*outdata = pnode_slave;
+
+	if (pnode_slave && pnode_master)
+		pnode_add_slave_pnode(pnode_master, pnode_slave);
+	return ret;
+}
+
+int vfs_make_mounted_func(struct vfsmount *mnt, enum pnode_vfs_type flag,
+		void *indata, va_list args)
+{
+	struct dentry *target_dentry;
+	int ret=0;
+	struct vfsmount *child_mount;
+	struct vfspnode *pnode;
+
+	target_dentry = va_arg(args, struct dentry *);
+	if (!(child_mount = do_make_mounted(mnt, target_dentry))) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	pnode = (struct vfspnode *)indata;
+	switch (flag) {
+	case PNODE_SLAVE_VFS :
+		pnode_add_slave_mnt(pnode, child_mount);
+		break;
+	case PNODE_MEMBER_VFS :
+		pnode_add_member_mnt(pnode, child_mount);
+		break;
+	}
+
+out:
+	return ret;
+}
+
+/*
+ * @pnode: pnode that contains the vfsmounts, on which the
+ *  		new mount is created at dentry 'dentry'
+ * @dentry: the dentry on which the new mount is created
+ * @mnt:   return the mount created on this vfsmount
+ * walk through all the vfsmounts belonging to this pnode
+ * as well as its slave pnodes and for each vfsmount create
+ * a new vfsmount at 'dentry'.  Return the vfsmount created
+ * at 'dentry' of vfsmount 'mnt'.
+ */
+struct vfsmount *pnode_make_mounted(struct vfspnode *pnode,
+		struct vfsmount *mnt, struct dentry *dentry)
+{
+	struct vfsmount *child_mnt;
+	struct vfspnode *child_pnode;
+
+	if (!(child_pnode = pnode_alloc()))
+		return NULL;
+
+	if (pnode_traverse(pnode, child_pnode, (void *)NULL,
+			pnode_mount_func, NULL, vfs_make_mounted_func,
+			(void *)dentry))
+  		goto error;
+	child_mnt = __lookup_mnt(mnt, dentry, dentry);
+	mntput(child_mnt);
+	return child_mnt;
+
+error:
+	pnode_make_unmounted(child_pnode);
+	return NULL;
+}
+
+int vfs_make_unmounted_func(struct vfsmount *mnt, enum pnode_vfs_type flag,
+		void *indata, va_list args)
+{
+	struct vfspnode *pnode;
+	int ret=0;
+
+	if (do_make_unmounted(mnt)) {
+		ret = 1;
+		goto out;
+	}
+
+	pnode = mnt->mnt_pnode;
+	spin_lock(&vfspnode_lock);
+	list_del_init(&mnt->mnt_pnode_mntlist);
+	put_pnode_locked(pnode);
+	spin_unlock(&vfspnode_lock);
+out:
+	return ret;
+}
+
+int pnode_make_unmounted(struct vfspnode *pnode)
+{
+	return pnode_traverse(pnode, NULL, (void *)NULL,
+			NULL, NULL, vfs_make_unmounted_func);
+}
+
+int vfs_prepare_mount_func(struct vfsmount *mnt, enum pnode_vfs_type flag,
+		void *indata, va_list args)
+{
+	struct vfsmount *source_mnt, *child_mnt, *p_mnt;
+	struct dentry *mountpoint_dentry;
+	struct vfspnode *pnode = (struct vfspnode *)indata;
+
+	source_mnt = va_arg(args, struct vfsmount * );
+	mountpoint_dentry =  va_arg(args, struct dentry *);
+	p_mnt =  va_arg(args, struct vfsmount *);
+
+	if ((p_mnt != mnt) || (source_mnt == source_mnt->mnt_parent)) {
+		child_mnt = do_attach_prepare_mnt(mnt, mountpoint_dentry,
+				source_mnt, (p_mnt != mnt));
+		if (!child_mnt)
+			return -ENOMEM;
+
+		if (child_mnt != source_mnt)
+			put_pnode(source_mnt->mnt_pnode);
+	} else
+		child_mnt = source_mnt;
+
+	switch (flag) {
+	case PNODE_SLAVE_VFS :
+		pnode_add_slave_mnt(pnode, child_mnt);
+		break;
+	case PNODE_MEMBER_VFS :
+		pnode_add_member_mnt(pnode, child_mnt);
+		break;
+	}
+
+	return 0;
+}
+
+int pnode_prepare_mount(struct vfspnode *pnode,
+		struct vfspnode *master_child_pnode,
+		struct dentry *mountpoint_dentry,
+		struct vfsmount *source_mnt,
+		struct vfsmount *mnt)
+{
+	return  pnode_traverse(pnode,
+			master_child_pnode,
+			(void *)NULL,
+			pnode_mount_func,
+			NULL,
+			vfs_prepare_mount_func,
+			source_mnt,
+			mountpoint_dentry,
+			mnt);
+}
+
+int pnode_commit_mount_post_func(struct vfspnode *pnode, void *indata,
+		va_list args)
+{
+	if (va_arg(args, int)) {
+		spin_lock(&vfspnode_lock);
+		BUG_ON(!list_empty(&pnode->pnode_vfs));
+		BUG_ON(!list_empty(&pnode->pnode_slavevfs));
+		BUG_ON(!list_empty(&pnode->pnode_slavepnode));
+		list_del_init(&pnode->pnode_peer_slave);
+		put_pnode_locked(pnode);
+		spin_unlock(&vfspnode_lock);
+	}
+	return 0;
+}
+
+int vfs_commit_mount_func(struct vfsmount *mnt, enum pnode_vfs_type flag,
+		void *indata, va_list args)
+{
+	BUG_ON(mnt == mnt->mnt_parent);
+	do_attach_commit_mnt(mnt);
+	if (va_arg(args, int)) {
+		spin_lock(&vfspnode_lock);
+		list_del_init(&mnt->mnt_pnode_mntlist);
+		put_pnode_locked(mnt->mnt_pnode);
+		spin_unlock(&vfspnode_lock);
+		mnt->mnt_pnode = NULL;
+	}
+	return 0;
+}
+
+/*
+ * @pnode: walk the propogation tree and complete the
+ * 	attachments of the child mounts to the parents
+ * 	correspondingly.
+ * @flag: if set destroy the propogation tree
+ */
+int pnode_commit_mount(struct vfspnode *pnode, int flag)
+{
+	return  pnode_traverse(pnode,
+			NULL, (void *)NULL, NULL, pnode_commit_mount_post_func,
+			vfs_commit_mount_func, flag);
+}
+
+int vfs_abort_mount_func(struct vfsmount *mnt,
+		enum pnode_vfs_type flag, void *indata, va_list args)
+
+{
+	struct vfsmount *exception_mnt = va_arg(args, struct vfsmount *);
+	BUG_ON(!mnt->mnt_pnode);
+	pnode_disassociate_mnt(mnt);
+	do_detach_prepare_mnt(mnt, (exception_mnt != mnt));
+	return 0;
+}
+
+/*
+ * clean the propogation tree under pnode, releasing all
+ * the mounts, except exception_mnt
+ * @pnode: the pnode tree to be cleanup unlinking and
+ * 	releasing all pnodes in the tree as well as
+ * 	unlinking any mounts, except 'exception_mnt'
+ * @exception_mnt: the mnt to be unlinked from pnode
+ * 		bug not released.
+ */
+int pnode_abort_mount(struct vfspnode *pnode,
+		struct vfsmount *exception_mnt)
+{
+	return  pnode_traverse(pnode,
+			NULL, (void *)NULL, NULL, NULL,
+			vfs_abort_mount_func, exception_mnt);
+}
Index: 2.6.12.work2/include/linux/fs.h
===================================================================
--- 2.6.12.work2.orig/include/linux/fs.h
+++ 2.6.12.work2/include/linux/fs.h
@@ -1216,7 +1216,12 @@ extern struct vfsmount *kern_mount(struc
 extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);
 extern long do_mount(char *, char *, char *, unsigned long, void *);
+extern struct vfsmount *do_attach_prepare_mnt(struct vfsmount *,
+		struct dentry *, struct vfsmount *, int);
+extern void do_attach_commit_mnt(struct vfsmount *);
 extern struct vfsmount *do_make_mounted(struct vfsmount *, struct dentry *);
+extern int do_make_unmounted(struct vfsmount *);
+extern void do_detach_prepare_mnt(struct vfsmount *, int);
 
 extern int vfs_statfs(struct super_block *, struct kstatfs *);
 
Index: 2.6.12.work2/include/linux/namespace.h
===================================================================
--- 2.6.12.work2.orig/include/linux/namespace.h
+++ 2.6.12.work2/include/linux/namespace.h
@@ -9,7 +9,6 @@ struct namespace {
 	atomic_t		count;
 	struct vfsmount *	root;
 	struct list_head	list;
-	struct rw_semaphore	sem;
 };
 
 extern void umount_tree(struct vfsmount *);
Index: 2.6.12.work2/include/linux/dcache.h
===================================================================
--- 2.6.12.work2.orig/include/linux/dcache.h
+++ 2.6.12.work2/include/linux/dcache.h
@@ -329,6 +329,8 @@ static inline int d_mountpoint(struct de
 }
 
 extern struct vfsmount *lookup_mnt(struct vfsmount *, struct dentry *);
+extern struct vfsmount *__lookup_mnt(struct vfsmount *,
+		struct dentry *, struct dentry *);
 extern struct dentry *lookup_create(struct nameidata *nd, int is_dir);
 
 extern int sysctl_vfs_cache_pressure;

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
  2005-07-25 22:44 (unknown) Ram Pai
  2005-07-25 22:44 ` (unknown) Ram Pai
@ 2005-07-25 22:44 ` Ram Pai
  2005-07-25 22:44 ` (unknown) Ram Pai
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 211+ messages in thread
From: Ram Pai @ 2005-07-25 22:44 UTC (permalink / raw)
  To: akpm, Al Viro; +Cc: Avantika Mathur, Mike Waychison

, miklos@szeredi.hu, Janak Desai <janak@us.ibm.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 2/7] shared subtree
Content-Type: text/x-patch; name=unclone.patch
Content-Disposition: inline; filename=unclone.patch

 Adds the ability to unclone a vfs tree. A uncloned vfs tree will not be
 clonnable, and hence cannot be bind/rbind to any other mountpoint.

 RP

Signed by Ram Pai (linuxram@us.ibm.com)

 fs/namespace.c        |   15 ++++++++++++++-
 include/linux/fs.h    |    1 +
 include/linux/mount.h |   15 +++++++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

Index: 2.6.12.work2/fs/namespace.c
===================================================================
--- 2.6.12.work2.orig/fs/namespace.c
+++ 2.6.12.work2/fs/namespace.c
@@ -673,6 +673,14 @@ static int do_make_private(struct vfsmou
 	return 0;
 }
 
+static int do_make_unclone(struct vfsmount *mnt)
+{
+	if(mnt->mnt_pnode)
+		pnode_disassociate_mnt(mnt);
+	set_mnt_unclone(mnt);
+	return 0;
+}
+
 /*
  * recursively change the type of the mountpoint.
  */
@@ -682,6 +690,7 @@ static int do_change_type(struct nameida
 	int err=0;
 
 	if (!(flag & MS_SHARED) && !(flag & MS_PRIVATE)
+			&& !(flag & MS_UNCLONE)
 			&& !(flag & MS_SLAVE))
 		return -EINVAL;
 
@@ -700,6 +709,9 @@ static int do_change_type(struct nameida
 		case MS_PRIVATE:
 			err = do_make_private(m);
 			break;
+		case MS_UNCLONE:
+			err = do_make_unclone(m);
+			break;
 		}
 	}
 	spin_unlock(&vfsmount_lock);
@@ -1140,7 +1152,8 @@ long do_mount(char * dev_name, char * di
 				    data_page);
 	else if (flags & MS_BIND)
 		retval = do_loopback(&nd, dev_name, flags & MS_REC);
-	else if (flags & MS_SHARED || flags & MS_PRIVATE || flags & MS_SLAVE)
+	else if (flags & MS_SHARED || flags & MS_UNCLONE ||
+			flags & MS_PRIVATE || flags & MS_SLAVE)
 		retval = do_change_type(&nd, flags);
 	else if (flags & MS_MOVE)
 		retval = do_move_mount(&nd, dev_name);
Index: 2.6.12.work2/include/linux/fs.h
===================================================================
--- 2.6.12.work2.orig/include/linux/fs.h
+++ 2.6.12.work2/include/linux/fs.h
@@ -102,6 +102,7 @@ extern int dir_notify_enable;
 #define MS_MOVE		8192
 #define MS_REC		16384
 #define MS_VERBOSE	32768
+#define MS_UNCLONE	(1<<17) /* recursively change to unclonnable */
 #define MS_PRIVATE	(1<<18) /* recursively change to private */
 #define MS_SLAVE	(1<<19) /* recursively change to slave */
 #define MS_SHARED	(1<<20) /* recursively change to shared */
Index: 2.6.12.work2/include/linux/mount.h
===================================================================
--- 2.6.12.work2.orig/include/linux/mount.h
+++ 2.6.12.work2/include/linux/mount.h
@@ -22,15 +22,18 @@
 #define MNT_PRIVATE	0x10  /* if the vfsmount is private, by default it is private*/
 #define MNT_SLAVE	0x20  /* if the vfsmount is a slave mount of its pnode */
 #define MNT_SHARED	0x40  /* if the vfsmount is a slave mount of its pnode */
+#define MNT_UNCLONE	0x80  /* if the vfsmount is unclonable */
 #define MNT_PNODE_MASK	0xf0  /* propogation flag mask */
 
 #define IS_MNT_SHARED(mnt) (mnt->mnt_flags & MNT_SHARED)
 #define IS_MNT_SLAVE(mnt) (mnt->mnt_flags & MNT_SLAVE)
 #define IS_MNT_PRIVATE(mnt) (mnt->mnt_flags & MNT_PRIVATE)
+#define IS_MNT_UNCLONE(mnt) (mnt->mnt_flags & MNT_UNCLONE)
 
 #define CLEAR_MNT_SHARED(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_SHARED))
 #define CLEAR_MNT_PRIVATE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_PRIVATE))
 #define CLEAR_MNT_SLAVE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_SLAVE))
+#define CLEAR_MNT_UNCLONE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_UNCLONE))
 
 struct vfsmount
 {
@@ -59,6 +62,7 @@ static inline void set_mnt_shared(struct
 	mnt->mnt_flags |= MNT_PNODE_MASK & MNT_SHARED;
 	CLEAR_MNT_PRIVATE(mnt);
 	CLEAR_MNT_SLAVE(mnt);
+	CLEAR_MNT_UNCLONE(mnt);
 }
 
 static inline void set_mnt_private(struct vfsmount *mnt)
@@ -66,6 +70,16 @@ static inline void set_mnt_private(struc
 	mnt->mnt_flags |= MNT_PNODE_MASK & MNT_PRIVATE;
 	CLEAR_MNT_SLAVE(mnt);
 	CLEAR_MNT_SHARED(mnt);
+	CLEAR_MNT_UNCLONE(mnt);
+	mnt->mnt_pnode = NULL;
+}
+
+static inline void set_mnt_unclone(struct vfsmount *mnt)
+{
+	mnt->mnt_flags |= MNT_PNODE_MASK & MNT_UNCLONE;
+	CLEAR_MNT_SLAVE(mnt);
+	CLEAR_MNT_SHARED(mnt);
+	CLEAR_MNT_PRIVATE(mnt);
 	mnt->mnt_pnode = NULL;
 }
 
@@ -74,6 +88,7 @@ static inline void set_mnt_slave(struct 
 	mnt->mnt_flags |= MNT_PNODE_MASK & MNT_SLAVE;
 	CLEAR_MNT_PRIVATE(mnt);
 	CLEAR_MNT_SHARED(mnt);
+	CLEAR_MNT_UNCLONE(mnt);
 }
 
 static inline struct vfsmount *mntget(struct vfsmount *mnt)

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
  2005-07-25 22:44 (unknown) Ram Pai
@ 2005-07-25 22:44 ` Ram Pai
  2005-07-25 22:44 ` (unknown) Ram Pai
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 211+ messages in thread
From: Ram Pai @ 2005-07-25 22:44 UTC (permalink / raw)
  To: akpm, Al Viro; +Cc: Avantika Mathur, Mike Waychison

, miklos@szeredi.hu, Janak Desai <janak@us.ibm.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 1/7] shared subtree
Content-Type: text/x-patch; name=shared_private_slave.patch
Content-Disposition: inline; filename=shared_private_slave.patch

This patch adds the shared/private/slave support for VFS trees.

Signed by Ram Pai (linuxram@us.ibm.com)

 fs/Makefile           |    2 
 fs/dcache.c           |    2 
 fs/namespace.c        |   93 ++++++++++
 fs/pnode.c            |  441 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h    |    5 
 include/linux/mount.h |   44 ++++
 include/linux/pnode.h |   90 ++++++++++
 7 files changed, 673 insertions(+), 4 deletions(-)

Index: 2.6.12.work2/fs/namespace.c
===================================================================
--- 2.6.12.work2.orig/fs/namespace.c
+++ 2.6.12.work2/fs/namespace.c
@@ -22,6 +22,7 @@
 #include <linux/namei.h>
 #include <linux/security.h>
 #include <linux/mount.h>
+#include <linux/pnode.h>
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 
@@ -62,6 +63,7 @@ struct vfsmount *alloc_vfsmnt(const char
 		INIT_LIST_HEAD(&mnt->mnt_mounts);
 		INIT_LIST_HEAD(&mnt->mnt_list);
 		INIT_LIST_HEAD(&mnt->mnt_fslink);
+		INIT_LIST_HEAD(&mnt->mnt_pnode_mntlist);
 		if (name) {
 			int size = strlen(name)+1;
 			char *newname = kmalloc(size, GFP_KERNEL);
@@ -615,6 +617,95 @@ out_unlock:
 	return err;
 }
 
+static int do_make_shared(struct vfsmount *mnt)
+{
+	int err=0;
+	struct vfspnode *old_pnode = NULL;
+	/*
+	 * if the mount is already a slave mount,
+	 * allocate a new pnode and make it
+	 * a slave pnode of the original pnode.
+	 */
+	if (IS_MNT_SLAVE(mnt)) {
+		old_pnode = mnt->mnt_pnode;
+		pnode_del_slave_mnt(mnt);
+	}
+	if(!IS_MNT_SHARED(mnt)) {
+		mnt->mnt_pnode = pnode_alloc();
+		if(!mnt->mnt_pnode) {
+			pnode_add_slave_mnt(old_pnode, mnt);
+			err = -ENOMEM;
+			goto out;
+		}
+		pnode_add_member_mnt(mnt->mnt_pnode, mnt);
+	}
+	if(old_pnode)
+		pnode_add_slave_pnode(old_pnode, mnt->mnt_pnode);
+	set_mnt_shared(mnt);
+out:
+	return err;
+}
+
+static int do_make_slave(struct vfsmount *mnt)
+{
+	int err=0;
+
+	if (IS_MNT_SLAVE(mnt))
+		goto out;
+	/*
+	 * only shared mounts can
+	 * be made slave
+	 */
+	if (!IS_MNT_SHARED(mnt)) {
+		err = -EINVAL;
+		goto out;
+	}
+	pnode_member_to_slave(mnt);
+out:
+	return err;
+}
+
+static int do_make_private(struct vfsmount *mnt)
+{
+	if(mnt->mnt_pnode)
+		pnode_disassociate_mnt(mnt);
+	set_mnt_private(mnt);
+	return 0;
+}
+
+/*
+ * recursively change the type of the mountpoint.
+ */
+static int do_change_type(struct nameidata *nd, int flag)
+{
+	struct vfsmount *m, *mnt = nd->mnt;
+	int err=0;
+
+	if (!(flag & MS_SHARED) && !(flag & MS_PRIVATE)
+			&& !(flag & MS_SLAVE))
+		return -EINVAL;
+
+	if (nd->dentry != nd->mnt->mnt_root)
+		return -EINVAL;
+
+	spin_lock(&vfsmount_lock);
+	for (m = mnt; m; m = next_mnt(m, mnt)) {
+		switch (flag) {
+		case MS_SHARED:
+			err = do_make_shared(m);
+			break;
+		case MS_SLAVE:
+			err = do_make_slave(m);
+			break;
+		case MS_PRIVATE:
+			err = do_make_private(m);
+			break;
+		}
+	}
+	spin_unlock(&vfsmount_lock);
+	return err;
+}
+
 /*
  * do loopback mount.
  */
@@ -1049,6 +1140,8 @@ long do_mount(char * dev_name, char * di
 				    data_page);
 	else if (flags & MS_BIND)
 		retval = do_loopback(&nd, dev_name, flags & MS_REC);
+	else if (flags & MS_SHARED || flags & MS_PRIVATE || flags & MS_SLAVE)
+		retval = do_change_type(&nd, flags);
 	else if (flags & MS_MOVE)
 		retval = do_move_mount(&nd, dev_name);
 	else
Index: 2.6.12.work2/fs/pnode.c
===================================================================
--- /dev/null
+++ 2.6.12.work2/fs/pnode.c
@@ -0,0 +1,441 @@
+/*
+ *  linux/fs/pnode.c
+ *
+ * (C) Copyright IBM Corporation 2005.
+ *	Released under GPL v2.
+ *	Author : Ram Pai (linuxram@us.ibm.com)
+ *
+ */
+
+#include <linux/config.h>
+#include <linux/syscalls.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/smp_lock.h>
+#include <linux/init.h>
+#include <linux/quotaops.h>
+#include <linux/acct.h>
+#include <linux/module.h>
+#include <linux/seq_file.h>
+#include <linux/namespace.h>
+#include <linux/namei.h>
+#include <linux/security.h>
+#include <linux/mount.h>
+#include <linux/pnode.h>
+#include <asm/uaccess.h>
+#include <asm/unistd.h>
+#include <stdarg.h>
+
+
+static kmem_cache_t * pnode_cachep;
+
+/* spinlock for pnode related operations */
+ __cacheline_aligned_in_smp DEFINE_SPINLOCK(vfspnode_lock);
+
+enum pnode_vfs_type {
+	PNODE_MEMBER_VFS = 0x01,
+	PNODE_SLAVE_VFS = 0x02
+};
+
+void __init pnode_init(unsigned long mempages)
+{
+	pnode_cachep = kmem_cache_create("pnode_cache",
+                       sizeof(struct vfspnode), 0,
+                       SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL);
+}
+
+struct vfspnode * pnode_alloc(void)
+{
+	struct vfspnode *pnode =  kmem_cache_alloc(pnode_cachep, GFP_KERNEL);
+	INIT_LIST_HEAD(&pnode->pnode_vfs);
+	INIT_LIST_HEAD(&pnode->pnode_slavevfs);
+	INIT_LIST_HEAD(&pnode->pnode_slavepnode);
+	INIT_LIST_HEAD(&pnode->pnode_peer_slave);
+	pnode->pnode_master = NULL;
+	pnode->pnode_flags = 0;
+	atomic_set(&pnode->pnode_count,0);
+	return pnode;
+}
+
+void inline pnode_free(struct vfspnode *pnode)
+{
+	kmem_cache_free(pnode_cachep, pnode);
+}
+
+/*
+ * __put_pnode() should be called with vfspnode_lock held
+ */
+void __put_pnode(struct vfspnode *pnode)
+{
+	struct vfspnode *tmp_pnode;
+	do {
+		tmp_pnode = pnode->pnode_master;
+		list_del_init(&pnode->pnode_peer_slave);
+		BUG_ON(!list_empty(&pnode->pnode_vfs));
+		BUG_ON(!list_empty(&pnode->pnode_slavevfs));
+		BUG_ON(!list_empty(&pnode->pnode_slavepnode));
+		pnode_free(pnode);
+		pnode = tmp_pnode;
+		if (!pnode || !atomic_dec_and_test(&pnode->pnode_count))
+			break;
+	} while(pnode);
+}
+
+static void inline pnode_add_mnt(struct vfspnode *pnode,
+		struct vfsmount *mnt, int slave)
+{
+	if (!pnode || !mnt)
+		return;
+	spin_lock(&vfspnode_lock);
+	mnt->mnt_pnode = pnode;
+	if (slave) {
+		set_mnt_slave(mnt);
+		list_add(&mnt->mnt_pnode_mntlist, &pnode->pnode_slavevfs);
+	} else {
+		set_mnt_shared(mnt);
+		list_add(&mnt->mnt_pnode_mntlist, &pnode->pnode_vfs);
+	}
+	get_pnode(pnode);
+	spin_unlock(&vfspnode_lock);
+}
+
+void pnode_add_member_mnt(struct vfspnode *pnode,
+		struct vfsmount *mnt)
+{
+	pnode_add_mnt(pnode, mnt, 0);
+}
+
+void pnode_add_slave_mnt(struct vfspnode *pnode,
+		struct vfsmount *mnt)
+{
+	pnode_add_mnt(pnode, mnt, 1);
+}
+
+
+void pnode_add_slave_pnode(struct vfspnode *pnode,
+		struct vfspnode *slave_pnode)
+{
+	if (!pnode || !slave_pnode)
+		return;
+	spin_lock(&vfspnode_lock);
+	slave_pnode->pnode_master = pnode;
+	slave_pnode->pnode_flags = 0;
+	list_add(&slave_pnode->pnode_peer_slave, &pnode->pnode_slavepnode);
+	get_pnode(pnode);
+	spin_unlock(&vfspnode_lock);
+}
+
+/*
+ * merge 'pnode' into 'peer_pnode' and get rid of pnode
+ * @pnode: pnode the contents of which have to be merged
+ * @peer_pnode: pnode into which the contents are merged
+ */
+int pnode_merge_pnode(struct vfspnode *pnode, struct vfspnode *peer_pnode)
+{
+	struct vfspnode *slave_pnode, *pnext;
+	struct vfsmount *mnt, *slave_mnt, *next;
+
+	list_for_each_entry_safe(slave_pnode,  pnext,
+			&pnode->pnode_slavepnode, pnode_peer_slave) {
+		slave_pnode->pnode_master = peer_pnode;
+		list_move(&slave_pnode->pnode_peer_slave,
+				&peer_pnode->pnode_slavepnode);
+		put_pnode_locked(pnode);
+		get_pnode(peer_pnode);
+	}
+
+	list_for_each_entry_safe(slave_mnt,  next,
+			&pnode->pnode_slavevfs, mnt_pnode_mntlist) {
+		slave_mnt->mnt_pnode = peer_pnode;
+		list_move(&slave_mnt->mnt_pnode_mntlist,
+				&peer_pnode->pnode_slavevfs);
+		put_pnode_locked(pnode);
+		get_pnode(peer_pnode);
+	}
+
+	list_for_each_entry_safe(mnt, next,
+			&pnode->pnode_vfs, mnt_pnode_mntlist) {
+		mnt->mnt_pnode = peer_pnode;
+		list_move(&mnt->mnt_pnode_mntlist,
+				&peer_pnode->pnode_vfs);
+		put_pnode_locked(pnode);
+		get_pnode(peer_pnode);
+	}
+	return 0;
+}
+
+/*
+ * called when pnode has no member mounts.  Merge all the slave mounts/pnodes
+ * of this pnode with that of its master pnode. If master pnode does not exit,
+ * convert all the slave mounts to private mounts.
+ */
+static void empty_pnode(struct vfspnode *pnode) { struct vfsmount *slave_mnt,
+	*next; struct vfspnode *master_pnode, *slave_pnode, *pnext;
+
+	if ((master_pnode = pnode->pnode_master)) {
+		pnode->pnode_master = NULL;
+		list_del_init(&pnode->pnode_peer_slave);
+		pnode_merge_pnode(pnode, master_pnode);
+		put_pnode_locked(master_pnode);
+	} else {
+		list_for_each_entry_safe(slave_mnt, next,
+			&pnode->pnode_slavevfs, mnt_pnode_mntlist) {
+			list_del_init(&slave_mnt->mnt_pnode_mntlist);
+			set_mnt_private(slave_mnt);
+			put_pnode_locked(pnode);
+		}
+		list_for_each_entry_safe(slave_pnode,  pnext,
+			&pnode->pnode_slavepnode, pnode_peer_slave) {
+			slave_pnode->pnode_master = NULL;
+			list_del_init(&slave_pnode->pnode_peer_slave);
+			put_pnode_locked(pnode);
+		}
+	}
+}
+
+static void __pnode_disassociate_mnt(struct vfsmount *mnt)
+{
+	struct vfspnode *pnode = mnt->mnt_pnode;
+
+	spin_lock(&vfspnode_lock);
+	list_del_init(&mnt->mnt_pnode_mntlist);
+
+	if (list_empty(&pnode->pnode_vfs))
+		empty_pnode(pnode);
+
+	put_pnode_locked(pnode);
+
+	spin_unlock(&vfspnode_lock);
+	mnt->mnt_pnode = NULL;
+}
+
+void pnode_del_slave_mnt(struct vfsmount *mnt)
+{
+	if (!mnt || !mnt->mnt_pnode)
+		return;
+ 	__pnode_disassociate_mnt(mnt);
+	CLEAR_MNT_SLAVE(mnt);
+}
+
+void pnode_del_member_mnt(struct vfsmount *mnt)
+{
+	if (!mnt || !mnt->mnt_pnode)
+		return;
+ 	__pnode_disassociate_mnt(mnt);
+	CLEAR_MNT_SHARED(mnt);
+}
+
+void pnode_member_to_slave(struct vfsmount *mnt)
+{
+	struct vfspnode *pnode = mnt->mnt_pnode;
+	if (!mnt || !pnode)
+		return;
+
+	spin_lock(&vfspnode_lock);
+
+	list_del_init(&mnt->mnt_pnode_mntlist);
+	list_add(&mnt->mnt_pnode_mntlist, &pnode->pnode_slavevfs);
+	set_mnt_slave(mnt);
+
+	if (list_empty(&pnode->pnode_vfs))
+		empty_pnode(pnode);
+
+	spin_unlock(&vfspnode_lock);
+	return;
+}
+
+void pnode_disassociate_mnt(struct vfsmount *mnt)
+{
+	if (!mnt || !mnt->mnt_pnode)
+		return;
+ 	__pnode_disassociate_mnt(mnt);
+	CLEAR_MNT_SHARED(mnt);
+	CLEAR_MNT_SLAVE(mnt);
+}
+
+struct pcontext {
+	struct vfspnode *start;
+	int	level;
+	struct vfspnode *master_pnode;
+	struct vfspnode *pnode;
+};
+
+/*
+ * Walk the pnode tree for each pnode encountered.
+ * @context: provides context on the state of the last walk in the pnode
+ * 		tree.
+ */
+static int pnode_next(struct pcontext *context)
+{
+	struct vfspnode *pnode = context->pnode;
+	struct vfspnode	*master_pnode=context->master_pnode;
+	struct list_head *next;
+
+	if (!pnode) {
+		BUG_ON(!context->start);
+		get_pnode(context->start);
+		context->pnode = context->start;
+		context->master_pnode = NULL;
+		context->level = 0;
+		return 1;
+	}
+
+	spin_lock(&vfspnode_lock);
+	next = pnode->pnode_slavepnode.next;
+	if (next == &pnode->pnode_slavepnode) {
+		while (1) {
+			int flag;
+
+			if (pnode == context->start) {
+				put_pnode_locked(pnode);
+				spin_unlock(&vfspnode_lock);
+				BUG_ON(context->level != 0);
+				return 0;
+			}
+
+			next = pnode->pnode_peer_slave.next;
+			flag = (next != &pnode->pnode_master->pnode_slavepnode);
+			put_pnode_locked(pnode);
+
+			if (flag)
+				break;
+
+			pnode = master_pnode;
+			master_pnode = pnode->pnode_master;
+			context->level--;
+		}
+	} else {
+		master_pnode = pnode;
+		context->level++;
+	}
+
+	pnode = list_entry(next, struct vfspnode, pnode_peer_slave);
+	get_pnode(pnode);
+
+	context->pnode = pnode;
+	context->master_pnode = master_pnode;
+	spin_unlock(&vfspnode_lock);
+	return 1;
+}
+
+/*
+ * skip the rest of the tree, cleaning up
+ * reference to pnodes held in pnode_next().
+ */
+static void pnode_end(struct pcontext *context)
+{
+	struct vfspnode *p = context->pnode;
+	struct vfspnode *start = context->start;
+
+	do {
+		put_pnode(p);
+	} while (p != start && (p = p->pnode_master));
+	return;
+}
+
+/*
+ * traverse the pnode tree and at each pnode encountered, execute the
+ * pnode_fnc(). For each vfsmount encountered call the vfs_fnc().
+ *
+ * @pnode: pnode tree to be traversed
+ * @in_data: input data
+ * @out_data: output data
+ * @pnode_func: function to be called when a new pnode is encountered.
+ * @vfs_func: function to be called on each slave and member vfs belonging
+ * 		to the pnode.
+ */
+static int pnode_traverse(struct vfspnode *pnode,
+		void *in_data,
+		void **out_data,
+		int (*pnode_pre_func)(struct vfspnode *,
+			void *, void **, va_list),
+		int (*pnode_post_func)(struct vfspnode *,
+			void *, va_list),
+		int (*vfs_func)(struct vfsmount *,
+			enum pnode_vfs_type, void *,  va_list),
+		...)
+{
+	va_list args;
+	int ret = 0, level;
+	void *my_data, *data_from_master;
+     	struct vfspnode *master_pnode;
+     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
+	struct pcontext context;
+	static void *p_array[PNODE_MAX_SLAVE_LEVEL];
+
+	context.start = pnode;
+	context.pnode = NULL;
+	/*
+	 * determine whether to process vfs first or the
+	 * slave pnode first
+	 */
+	while (pnode_next(&context)) {
+		level = context.level;
+
+		if (level >= PNODE_MAX_SLAVE_LEVEL)
+			goto error;
+
+		pnode = context.pnode;
+		master_pnode = context.master_pnode;
+
+		if (master_pnode) {
+			data_from_master = p_array[level-1];
+			my_data = NULL;
+		} else {
+			data_from_master = NULL;
+			my_data = in_data;
+		}
+
+		if (pnode_pre_func) {
+			va_start(args, vfs_func);
+			if((ret = pnode_pre_func(pnode,
+				data_from_master, &my_data, args)))
+				goto error;
+			va_end(args);
+		}
+
+		// traverse member vfsmounts
+		spin_lock(&vfspnode_lock);
+		list_for_each_entry_safe(member_mnt,
+			t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
+
+			spin_unlock(&vfspnode_lock);
+			va_start(args, vfs_func);
+			if ((ret = vfs_func(member_mnt,
+				PNODE_MEMBER_VFS, my_data, args)))
+				goto error;
+			va_end(args);
+			spin_lock(&vfspnode_lock);
+		}
+		list_for_each_entry_safe(slave_mnt, t_m,
+			&pnode->pnode_slavevfs, mnt_pnode_mntlist) {
+
+			spin_unlock(&vfspnode_lock);
+			va_start(args, vfs_func);
+			if ((ret = vfs_func(slave_mnt, PNODE_SLAVE_VFS,
+				my_data, args)))
+				goto error;
+			va_end(args);
+			spin_lock(&vfspnode_lock);
+		}
+		spin_unlock(&vfspnode_lock);
+
+		if (pnode_post_func) {
+			va_start(args, vfs_func);
+			if((ret = pnode_post_func(pnode,
+				my_data, args)))
+				goto error;
+			va_end(args);
+		}
+
+		p_array[level] = my_data;
+	}
+out:
+	if (out_data)
+		*out_data = p_array[0];
+	return ret;
+error:
+	va_end(args);
+	pnode_end(&context);
+	goto out;
+}
Index: 2.6.12.work2/fs/dcache.c
===================================================================
--- 2.6.12.work2.orig/fs/dcache.c
+++ 2.6.12.work2/fs/dcache.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/mount.h>
 #include <linux/file.h>
+#include <linux/pnode.h>
 #include <asm/uaccess.h>
 #include <linux/security.h>
 #include <linux/seqlock.h>
@@ -1737,6 +1738,7 @@ void __init vfs_caches_init(unsigned lon
 	inode_init(mempages);
 	files_init(mempages);
 	mnt_init(mempages);
+	pnode_init(mempages);
 	bdev_cache_init();
 	chrdev_init();
 }
Index: 2.6.12.work2/include/linux/fs.h
===================================================================
--- 2.6.12.work2.orig/include/linux/fs.h
+++ 2.6.12.work2/include/linux/fs.h
@@ -102,6 +102,9 @@ extern int dir_notify_enable;
 #define MS_MOVE		8192
 #define MS_REC		16384
 #define MS_VERBOSE	32768
+#define MS_PRIVATE	(1<<18) /* recursively change to private */
+#define MS_SLAVE	(1<<19) /* recursively change to slave */
+#define MS_SHARED	(1<<20) /* recursively change to shared */
 #define MS_POSIXACL	(1<<16)	/* VFS does not apply the umask */
 #define MS_ACTIVE	(1<<30)
 #define MS_NOUSER	(1<<31)
@@ -232,6 +235,7 @@ extern void update_atime (struct inode *
 extern void __init inode_init(unsigned long);
 extern void __init inode_init_early(void);
 extern void __init mnt_init(unsigned long);
+extern void __init pnode_init(unsigned long);
 extern void __init files_init(unsigned long);
 
 struct buffer_head;
@@ -1211,6 +1215,7 @@ extern struct vfsmount *kern_mount(struc
 extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);
 extern long do_mount(char *, char *, char *, unsigned long, void *);
+extern struct vfsmount *do_make_mounted(struct vfsmount *, struct dentry *);
 
 extern int vfs_statfs(struct super_block *, struct kstatfs *);
 
Index: 2.6.12.work2/include/linux/pnode.h
===================================================================
--- /dev/null
+++ 2.6.12.work2/include/linux/pnode.h
@@ -0,0 +1,90 @@
+/*
+ *  linux/fs/pnode.c
+ *
+ * (C) Copyright IBM Corporation 2005.
+ *	Released under GPL v2.
+ *
+ */
+#ifndef _LINUX_PNODE_H
+#define _LINUX_PNODE_H
+
+#include <linux/list.h>
+#include <linux/mount.h>
+#include <linux/spinlock.h>
+#include <asm/atomic.h>
+
+struct vfspnode {
+	struct list_head pnode_vfs; 	 /* list of vfsmounts anchored here */
+	struct list_head pnode_slavevfs; /* list of slave vfsmounts */
+	struct list_head pnode_slavepnode;/* list of slave pnode */
+	struct list_head pnode_peer_slave;/* going through master's slave pnode
+					    list*/
+	struct vfspnode	 *pnode_master;	  /* master pnode */
+	int 		 pnode_flags;
+	atomic_t 	 pnode_count;
+};
+#define PNODE_MAX_SLAVE_LEVEL 32  /* MAXIMUM DEPTH OF THE PNODE TREE */
+#define PNODE_DELETE  0x01
+#define PNODE_SLAVE   0x02
+
+#define IS_PNODE_DELETE(pn)  ((pn->pnode_flags&PNODE_DELETE)==PNODE_DELETE)
+#define IS_PNODE_SLAVE(pn)  ((pn->pnode_flags&PNODE_SLAVE)==PNODE_SLAVE)
+#define SET_PNODE_DELETE(pn)  pn->pnode_flags |= PNODE_DELETE
+#define SET_PNODE_SLAVE(pn)  pn->pnode_flags |= PNODE_SLAVE
+
+extern spinlock_t vfspnode_lock;
+extern void __put_pnode(struct vfspnode *);
+
+static inline struct vfspnode *
+get_pnode(struct vfspnode *pnode)
+{
+	if (!pnode)
+		return NULL;
+	atomic_inc(&pnode->pnode_count);
+	return pnode;
+}
+
+static inline void
+put_pnode(struct vfspnode *pnode)
+{
+	if (!pnode)
+		return;
+	if (atomic_dec_and_lock(&pnode->pnode_count, &vfspnode_lock)) {
+		__put_pnode(pnode);
+		spin_unlock(&vfspnode_lock);
+	}
+}
+
+/*
+ * must be called holding the vfspnode_lock
+ */
+static inline void
+put_pnode_locked(struct vfspnode *pnode)
+{
+	if (!pnode)
+		return;
+	if (atomic_dec_and_test(&pnode->pnode_count)) {
+		__put_pnode(pnode);
+	}
+}
+
+void __init pnode_init(unsigned long );
+struct vfspnode * pnode_alloc(void);
+void pnode_add_slave_mnt(struct vfspnode *, struct vfsmount *);
+void pnode_add_member_mnt(struct vfspnode *, struct vfsmount *);
+void pnode_del_slave_mnt(struct vfsmount *);
+void pnode_del_member_mnt(struct vfsmount *);
+void pnode_disassociate_mnt(struct vfsmount *);
+void pnode_add_slave_pnode(struct vfspnode *, struct vfspnode *);
+struct vfsmount * pnode_make_mounted(struct vfspnode *, struct vfsmount *,
+		struct dentry *);
+void pnode_member_to_slave(struct vfsmount *);
+int pnode_merge_pnode(struct vfspnode *, struct vfspnode *);
+struct vfsmount * pnode_make_mounted(struct vfspnode *, struct vfsmount *,
+		struct dentry *);
+int  pnode_make_unmounted(struct vfspnode *);
+int pnode_prepare_mount(struct vfspnode *, struct vfspnode *, struct dentry *,
+		struct vfsmount *, struct vfsmount *);
+int pnode_commit_mount(struct vfspnode *, int);
+int pnode_abort_mount(struct vfspnode *, struct vfsmount *);
+#endif /* _LINUX_PNODE_H */
Index: 2.6.12.work2/include/linux/mount.h
===================================================================
--- 2.6.12.work2.orig/include/linux/mount.h
+++ 2.6.12.work2/include/linux/mount.h
@@ -16,9 +16,21 @@
 #include <linux/spinlock.h>
 #include <asm/atomic.h>
 
-#define MNT_NOSUID	1
-#define MNT_NODEV	2
-#define MNT_NOEXEC	4
+#define MNT_NOSUID	0x01
+#define MNT_NODEV	0x02
+#define MNT_NOEXEC	0x04
+#define MNT_PRIVATE	0x10  /* if the vfsmount is private, by default it is private*/
+#define MNT_SLAVE	0x20  /* if the vfsmount is a slave mount of its pnode */
+#define MNT_SHARED	0x40  /* if the vfsmount is a slave mount of its pnode */
+#define MNT_PNODE_MASK	0xf0  /* propogation flag mask */
+
+#define IS_MNT_SHARED(mnt) (mnt->mnt_flags & MNT_SHARED)
+#define IS_MNT_SLAVE(mnt) (mnt->mnt_flags & MNT_SLAVE)
+#define IS_MNT_PRIVATE(mnt) (mnt->mnt_flags & MNT_PRIVATE)
+
+#define CLEAR_MNT_SHARED(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_SHARED))
+#define CLEAR_MNT_PRIVATE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_PRIVATE))
+#define CLEAR_MNT_SLAVE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_SLAVE))
 
 struct vfsmount
 {
@@ -29,6 +41,10 @@ struct vfsmount
 	struct super_block *mnt_sb;	/* pointer to superblock */
 	struct list_head mnt_mounts;	/* list of children, anchored here */
 	struct list_head mnt_child;	/* and going through their mnt_child */
+	struct list_head mnt_pnode_mntlist;/* and going through their
+					   pnode's vfsmount */
+	struct vfspnode *mnt_pnode;	/* and going through their
+					   pnode's vfsmount */
 	atomic_t mnt_count;
 	int mnt_flags;
 	int mnt_expiry_mark;		/* true if marked for expiry */
@@ -38,6 +54,28 @@ struct vfsmount
 	struct namespace *mnt_namespace; /* containing namespace */
 };
 
+static inline void set_mnt_shared(struct vfsmount *mnt)
+{
+	mnt->mnt_flags |= MNT_PNODE_MASK & MNT_SHARED;
+	CLEAR_MNT_PRIVATE(mnt);
+	CLEAR_MNT_SLAVE(mnt);
+}
+
+static inline void set_mnt_private(struct vfsmount *mnt)
+{
+	mnt->mnt_flags |= MNT_PNODE_MASK & MNT_PRIVATE;
+	CLEAR_MNT_SLAVE(mnt);
+	CLEAR_MNT_SHARED(mnt);
+	mnt->mnt_pnode = NULL;
+}
+
+static inline void set_mnt_slave(struct vfsmount *mnt)
+{
+	mnt->mnt_flags |= MNT_PNODE_MASK & MNT_SLAVE;
+	CLEAR_MNT_PRIVATE(mnt);
+	CLEAR_MNT_SHARED(mnt);
+}
+
 static inline struct vfsmount *mntget(struct vfsmount *mnt)
 {
 	if (mnt)
Index: 2.6.12.work2/fs/Makefile
===================================================================
--- 2.6.12.work2.orig/fs/Makefile
+++ 2.6.12.work2/fs/Makefile
@@ -8,7 +8,7 @@
 obj-y :=	open.o read_write.o file_table.o buffer.o  bio.o super.o \
 		block_dev.o char_dev.o stat.o exec.o pipe.o namei.o fcntl.o \
 		ioctl.o readdir.o select.o fifo.o locks.o dcache.o inode.o \
-		attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
+		attr.o bad_inode.o file.o filesystems.o namespace.o pnode.o aio.o \
 		seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \
 
 obj-$(CONFIG_EPOLL)		+= eventpoll.o

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-07-25 22:44 Ram Pai
  2005-07-25 22:44 ` (unknown) Ram Pai
                   ` (6 more replies)
  0 siblings, 7 replies; 211+ messages in thread
From: Ram Pai @ 2005-07-25 22:44 UTC (permalink / raw)
  To: akpm, Al Viro; +Cc: Avantika Mathur, Mike Waychison

, miklos@szeredi.hu, Janak Desai <janak@us.ibm.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 0/7] shared subtree

Hi Andrew/Al Viro,

	Enclosing a final set of well tested patches that implement
	Al Viro's shared subtree proposal.

	These patches provide the ability to mark a mount tree as
	shared/private/slave/unclone, along with the ability to play with these
	trees with operations like bind/rbind/move/pivot_root/namespace-clone
	etc.

	I believe this powerful feature can help build features like
	per-user namespace.  Couple of projects may benefit from
	shared subtrees.
	1) automounter for the ability to automount across namespaces.
	2) SeLinux for implementing polyinstantiated trees.
	3) MVFS for providing versioning file system.
	4) FUSE for per-user namespaces?
	
	Thanks to Avantika for developing about 100+ test cases that tests
	various combintation of private/shared/slave/unclonable trees. All
	these tests have passed. I feel pretty confident about the stability of
	the code.
	
	The patches have been broken into 7 units, for ease of review.  I
	realize that patch-3 'rbind.patch' is a bit heavier than all the other
	patches. The reason being, most of the shared-subtree functionality 
	gets manifestated during bind/rbind operation.

	Couple of work items to be done are:
	1. modify the mount command to support this feature
		eg:  mount --make-shared /tmp
	2. a tool that can help visualize the propogation tree, maybe
		support in /proc?
	3. some documentation on how to use all this functionality.

	Please consider the patches for inclusion in your tree.

	The footprint of this code is pretty small in the normal code path
	where shared-subtree functionality is not used.

	Any suggestions/comments to improve the code is welcome.

Thanks,
RP

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-07-23  4:50 Mr.Derrick Tanner.
  0 siblings, 0 replies; 211+ messages in thread
From: Mr.Derrick Tanner. @ 2005-07-23  4:50 UTC (permalink / raw)




^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-07-12 14:36 P.Srikanth(RSRID)
  0 siblings, 0 replies; 211+ messages in thread
From: P.Srikanth(RSRID) @ 2005-07-12 14:36 UTC (permalink / raw)
  To: linux-fsdevel

 auth 333cccec subscribe linux-fsdevel sri@banyannetworks.com 


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-05-24  9:21 root
  0 siblings, 0 replies; 211+ messages in thread
From: root @ 2005-05-24  9:21 UTC (permalink / raw)


	by smtp.nexlab.net (Postfix) with ESMTP id 1474CFA32

	for <chiakotay@nexlab.it>; Tue, 24 May 2005 09:47:51 +0200 (CEST)

Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand

	id S261361AbVEXGin (ORCPT <rfc822;chiakotay@nexlab.it>);

	Tue, 24 May 2005 02:38:43 -0400

Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261355AbVEXGim

	(ORCPT <rfc822;linux-kernel-outgoing>);

	Tue, 24 May 2005 02:38:42 -0400

Received: from rev.193.226.233.9.euroweb.hu ([193.226.233.9]:36873 "EHLO

	dorka.pomaz.szeredi.hu") by vger.kernel.org with ESMTP

	id S261321AbVEXGiK (ORCPT <rfc822;linux-kernel@vger.kernel.org>);

	Tue, 24 May 2005 02:38:10 -0400

Received: from miko by dorka.pomaz.szeredi.hu with local (Exim 3.36 #1 (Debian))

	id 1DaSCb-0003Tw-00; Tue, 24 May 2005 07:43:41 +0200

To: mikew@google.com
Cc: jamie@shareable.org, linuxram@us.ibm.com,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	akpm@osdl.org, viro@parcelfarce.linux.theplanet.co.uk
In-reply-to: <429277CA.9050300@google.com> (message from Mike Waychison on

	Mon, 23 May 2005 17:39:38 -0700)

Subject: Re: [RFC][PATCH] rbind across namespaces

References: <1116627099.4397.43.camel@localhost> <E1DZNSN-0006cU-00@dorka.pomaz.szeredi.hu> <1116660380.4397.66.camel@localhost> <E1DZP37-0006hH-00@dorka.pomaz.szeredi.hu> <20050521134615.GB4274@mail.shareable.org> <E1DZlVn-0007a6-00@dorka.pomaz.szeredi.hu> <429277CA.9050300@google.com>

Message-Id: <E1DaSCb-0003Tw-00@dorka.pomaz.szeredi.hu>

From: Miklos Szeredi <miklos@szeredi.hu>
Date:	Tue, 24 May 2005 07:43:41 +0200

Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

X-Mailing-List:	linux-kernel@vger.kernel.org



> FWIW, all this stuff has already been done and posted here.
> 
> Detachable chunks of vfsmounts:
> http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109872862003192&w=2
> 
> 'Soft' reference counts for manipulating vfsmounts without pinning them 
> down:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=109872797030644&w=2

I think this might just interest Jamie Lokier.  He had a very similar
poposal recently, but without reference to this patch, so I guess he
wasn't aware of it.

> Referencing vfsmounts in userspace using a file descriptor:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=109871948812782&w=2

Why not just use /proc/PID/fd/FD?

> walking mountpoints in userspace: 
> http://marc.theaimsgroup.com/?l=linux-kernel&m=109875012510262&w=2

Is this needed?  Userspace can find out mountpoints from /proc/mounts
(or something similar for detached trees).

> attaching mountpoints in userspace:
> http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109875063100111&w=2

Again, bind from/to /proc/PID/fd/FD should work without any new
interfaces.

> detaching mountpoints in userspace:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=109880051800963&w=2

What's wrong with sys_umount()?

> getting info from a vfsmount:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=109875135030473&w=2

/proc or /sys should do fine for this purpose I think.

I agree, that having "floating trees" could be useful, but I don't see
the point of adding new interfaces to support it.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-05-24  9:16 root
  0 siblings, 0 replies; 211+ messages in thread
From: root @ 2005-05-24  9:16 UTC (permalink / raw)


	by smtp.nexlab.net (Postfix) with ESMTP id 9E548FB76

	for <chiakotay@nexlab.it>; Tue, 24 May 2005 10:01:47 +0200 (CEST)

Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand

	id S261380AbVEXHLm (ORCPT <rfc822;chiakotay@nexlab.it>);

	Tue, 24 May 2005 03:11:42 -0400

Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261383AbVEXHLm

	(ORCPT <rfc822;linux-kernel-outgoing>);

	Tue, 24 May 2005 03:11:42 -0400

Received: from wproxy.gmail.com ([64.233.184.198]:55599 "EHLO wproxy.gmail.com")

	by vger.kernel.org with ESMTP id S261380AbVEXHLf convert rfc822-to-8bit

	(ORCPT <rfc822;linux-kernel@vger.kernel.org>);

	Tue, 24 May 2005 03:11:35 -0400

Received: by wproxy.gmail.com with SMTP id 68so2499905wri

        for <linux-kernel@vger.kernel.org>; Tue, 24 May 2005 00:11:34 -0700 (PDT)

DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;

        s=beta; d=gmail.com;

        h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;

        b=tvIhVOeAzDNbsvJvSQ5uiPWVf1N9fUHV/ILTH1RV6NCU8oFfa/Deuwxu8gnPnKO2oOPcW9uTpMLIC9RWUkUYORUvUUF5ovlT66teN2p4kc1mPLda7J/oL/8dMHdOkC54WTfaWuiTjdCwq+AD65PTFOYTBk4bAGyeGa/JpWda7Uo=

Received: by 10.54.37.78 with SMTP id k78mr3954580wrk;

        Tue, 24 May 2005 00:11:34 -0700 (PDT)

Received: by 10.54.66.13 with HTTP; Tue, 24 May 2005 00:11:34 -0700 (PDT)

Message-ID: <84144f0205052400113c6f40fc@mail.gmail.com>

Date:	Tue, 24 May 2005 10:11:34 +0300

From: Pekka Enberg <penberg@gmail.com>
Reply-To: Pekka Enberg <penberg@gmail.com>
To: "ericvh@gmail.com" <ericvh@gmail.com>
Subject: Re: [RFC][patch 4/7] v9fs: VFS superblock operations (2.0-rc6)

Cc: linux-kernel@vger.kernel.org,
	v9fs-developer@lists.sourceforge.net,
	viro@parcelfarce.linux.theplanet.co.uk,
	linux-fsdevel@vger.kernel.org, penberg@cs.helsinki.fi
In-Reply-To: <200505232225.j4NMPte1029529@ms-smtp-02-eri0.texas.rr.com>

Mime-Version: 1.0

Content-Type:	text/plain; charset=US-ASCII

Content-Transfer-Encoding: 7BIT

Content-Disposition: inline

References: <200505232225.j4NMPte1029529@ms-smtp-02-eri0.texas.rr.com>

Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

X-Mailing-List:	linux-kernel@vger.kernel.org



Hi,

On 5/24/05, ericvh@gmail.com <ericvh@gmail.com> wrote:
> Index: fs/9p/v9fs.c
> ===================================================================
> --- /dev/null  (tree:0bf32353105286a5624aeea862d35a4bbae09851)
> +++ 178666ee376655ef8ec19a2ffc0490241b428110/fs/9p/v9fs.c  (mode:100644)
> @@ -0,0 +1,573 @@
> +/*
> +  * Fcall Slab Accounting
> +  */
> +
> +struct v9fs_slab {
> +       struct list_head list;
> +
> +       int size;
> +       kmem_cache_t *slab;
> +};
> +
> +static LIST_HEAD(v9fs_slab_list);

[snip]

> +
> +/**
> + * find_slab - look up a slab by size
> + * @size: size of slab data
> + *
> + */
> +
> +static inline kmem_cache_t *find_slab(int size)

Hmm? Why do you need this? If you're missing functionality from the
slab allocator, please put that in mm/slab.c, not your filesystem!

> +void v9fs_session_close(struct v9fs_session_info *v9ses)
> +{

[snip]

> +       if (v9ses->name) {
> +               kfree(v9ses->name);
> +       }

kfree() handles NULL pointers just fine, so please drop the redundant
check (here and in various other places too).

                       Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-05-24  9:15 root
  0 siblings, 0 replies; 211+ messages in thread
From: root @ 2005-05-24  9:15 UTC (permalink / raw)


	by smtp.nexlab.net (Postfix) with ESMTP id DD358FB7E

	for <chiakotay@nexlab.it>; Tue, 24 May 2005 10:01:50 +0200 (CEST)

Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand

	id S261364AbVEXGjR (ORCPT <rfc822;chiakotay@nexlab.it>);

	Tue, 24 May 2005 02:39:17 -0400

Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261355AbVEXGjR

	(ORCPT <rfc822;linux-kernel-outgoing>);

	Tue, 24 May 2005 02:39:17 -0400

Received: from rev.193.226.233.9.euroweb.hu ([193.226.233.9]:37129 "EHLO

	dorka.pomaz.szeredi.hu") by vger.kernel.org with ESMTP

	id S261353AbVEXGiQ (ORCPT <rfc822;linux-kernel@vger.kernel.org>);

	Tue, 24 May 2005 02:38:16 -0400

Received: from miko by dorka.pomaz.szeredi.hu with local (Exim 3.36 #1 (Debian))

	id 1DaSRW-0003V9-00; Tue, 24 May 2005 07:59:06 +0200

To: raven@themaw.net
Cc: linux-fsdevel@vger.kernel.org, autofs@linux.kernel.org,
	linux-kernel@vger.kernel.org
In-reply-to: <Pine.LNX.4.58.0505240846410.26293@wombat.indigo.net.au> (message

	from Ian Kent on Tue, 24 May 2005 09:06:07 +0800 (WST))

Subject: Re: [VFS-RFC] autofs4 and bind, rbind and move mount requests

References: <Pine.LNX.4.62.0505232041410.8361@donald.themaw.net>

 <E1DaERw-0002cC-00@dorka.pomaz.szeredi.hu> <Pine.LNX.4.62.0505232339250.3469@donald.themaw.net>

 <E1DaG04-0002hk-00@dorka.pomaz.szeredi.hu> <Pine.LNX.4.58.0505240846410.26293@wombat.indigo.net.au>

Message-Id: <E1DaSRW-0003V9-00@dorka.pomaz.szeredi.hu>

From: Miklos Szeredi <miklos@szeredi.hu>
Date:	Tue, 24 May 2005 07:59:06 +0200

Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

X-Mailing-List:	linux-kernel@vger.kernel.org



> > > Perhaps not in this case.
> > 
> > Maybe I'm misunderstanding.
> > 
> > Are you talking about an automounted filesystem, or the autofs
> > filesystem itself.
> 
> I'm talking about the autofs filesystem (actually the autofs4 module).

OK.

> > 
> > With the later I can well imagine that you have problems with bind and
> > move.
> 
> yep.
> 
> I'm not really concerned about whether bind and move mounts work or not. I 
> just need to establish whether these should be supported and if so, how 
> they should work so I can resolve the problem. Personally, I would be 
> happy to say these types of mounts are not supported by autofs if I could 
> veto the requests.

Does it work if somebody renames a directory in the path leading to
the autofs mountpoint?  The result is very similar to move mount.

You could solve both, by having the automoutnter daemon chdir to the
autofs root, and then it would just not care about any namespace
changes outside it's own filesystem.

Bind and clone(... CLONE_NEWNS) are trickier if you want to make
automounting work in the new instance.  It should be workable, if the
autofs kernel module returns a reference not just to the dentry but
the dentry/vfsmount pair to the daemon.  For example it could open a
file descriptor with dentry_open() refering to the mountpoint, and
pass that to userspace.  The daemon then can do the mount on in
(either by doing fchdir(fd) and 'mount blah .', or 'mount blah
/proc/PID/fd/FD').

This is all very theoretical, I don't know how the internals of
autofs...

On a related note, have you looked at using the kernel atumounter
support for autofs? (Documentation/filesystems/automount-support.txt)

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-05-24  9:12 root
  0 siblings, 0 replies; 211+ messages in thread
From: root @ 2005-05-24  9:12 UTC (permalink / raw)


	by smtp.nexlab.net (Postfix) with ESMTP id 4E3C4FB60

	for <chiakotay@nexlab.it>; Tue, 24 May 2005 09:58:21 +0200 (CEST)

Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand

	id S261383AbVEXHOX (ORCPT <rfc822;chiakotay@nexlab.it>);

	Tue, 24 May 2005 03:14:23 -0400

Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261392AbVEXHOT

	(ORCPT <rfc822;linux-kernel-outgoing>);

	Tue, 24 May 2005 03:14:19 -0400

Received: from wproxy.gmail.com ([64.233.184.193]:33354 "EHLO wproxy.gmail.com")

	by vger.kernel.org with ESMTP id S261383AbVEXHON convert rfc822-to-8bit

	(ORCPT <rfc822;linux-kernel@vger.kernel.org>);

	Tue, 24 May 2005 03:14:13 -0400

Received: by wproxy.gmail.com with SMTP id 68so2500606wri

        for <linux-kernel@vger.kernel.org>; Tue, 24 May 2005 00:14:13 -0700 (PDT)

DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;

        s=beta; d=gmail.com;

        h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;

        b=Y+AtHMac8HQzy5TaUfp4m0PBpzm0oAMdJiagefK39WeGGEUJ5tsNmxB8/E8QQ7ogyF+wwNow0lFWPQSmPezl80gGMfsCCrapBjh+Y3kxnJgLm65p+UG0vlwELW1k6BUnnasteKf0+Tw6+L+5o+csx3BfGv0UYJXZvu2OxvCB+hw=

Received: by 10.54.117.4 with SMTP id p4mr4023181wrc;

        Tue, 24 May 2005 00:14:13 -0700 (PDT)

Received: by 10.54.66.13 with HTTP; Tue, 24 May 2005 00:14:12 -0700 (PDT)

Message-ID: <84144f0205052400143e97796e@mail.gmail.com>

Date:	Tue, 24 May 2005 10:14:12 +0300

From: Pekka Enberg <penberg@gmail.com>
Reply-To: Pekka Enberg <penberg@gmail.com>
To: "ericvh@gmail.com" <ericvh@gmail.com>
Subject: Re: [RFC][patch 2/7] v9fs: VFS file and directory operations (2.0-rc6)

Cc: linux-kernel@vger.kernel.org,
	v9fs-developer@lists.sourceforge.net,
	viro@parcelfarce.linux.theplanet.co.uk,
	linux-fsdevel@vger.kernel.org, penberg@cs.helsinki.fi
In-Reply-To: <200505232225.j4NMPXe1029024@ms-smtp-02-eri0.texas.rr.com>

Mime-Version: 1.0

Content-Type:	text/plain; charset=US-ASCII

Content-Transfer-Encoding: 7BIT

Content-Disposition: inline

References: <200505232225.j4NMPXe1029024@ms-smtp-02-eri0.texas.rr.com>

Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

X-Mailing-List:	linux-kernel@vger.kernel.org



Hi,

On 5/24/05, ericvh@gmail.com <ericvh@gmail.com> wrote:
> +static ssize_t
> +v9fs_file_write(struct file *filp, const char __user * data,
> +               size_t count, loff_t * offset)
> +{
> +       int ret = -1;
> +       char *buffer;
> +
> +       buffer = kmalloc(count, GFP_KERNEL);
> +       if (buffer == NULL) {
> +               BUG();

I think simply returning -ENOMEM is sufficient. BUG seems way too
aggressive. (Found this in other places as well.)

> +               return -ENOMEM;
> +       }

                    Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-05-24  9:11 root
  0 siblings, 0 replies; 211+ messages in thread
From: root @ 2005-05-24  9:11 UTC (permalink / raw)


	by smtp.nexlab.net (Postfix) with ESMTP id 17CD3FB6B

	for <chiakotay@nexlab.it>; Tue, 24 May 2005 10:01:38 +0200 (CEST)

Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand

	id S261393AbVEXHRd (ORCPT <rfc822;chiakotay@nexlab.it>);

	Tue, 24 May 2005 03:17:33 -0400

Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261401AbVEXHRB

	(ORCPT <rfc822;linux-kernel-outgoing>);

	Tue, 24 May 2005 03:17:01 -0400

Received: from dsl081-242-086.sfo1.dsl.speakeasy.net ([64.81.242.86]:32665

	"EHLO lapdance.christiehouse.net") by vger.kernel.org with ESMTP

	id S261393AbVEXHPB (ORCPT <rfc822;linux-kernel@vger.kernel.org>);

	Tue, 24 May 2005 03:15:01 -0400

Received: from lapdance.christiehouse.net ([127.0.0.1] ident=crlf)

	by lapdance.christiehouse.net with esmtp (Exim 3.36 #1 (Debian))

	id 1DaTbg-0004UP-00; Tue, 24 May 2005 03:13:40 -0400

Message-ID: <4292D416.5070001@waychison.com>

Date:	Tue, 24 May 2005 03:13:26 -0400

From: Mike Waychison <mike@waychison.com>
User-Agent: Debian Thunderbird 1.0.2 (X11/20050331)

X-Accept-Language: en-us, en

MIME-Version: 1.0

Newsgroups: gmane.linux.file-systems,gmane.linux.kernel

To: Miklos Szeredi <miklos@szeredi.hu>
Cc: jamie@shareable.org, linuxram@us.ibm.com,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	akpm@osdl.org, viro@parcelfarce.linux.theplanet.co.uk
Subject: Re: [RFC][PATCH] rbind across namespaces

References: <1116627099.4397.43.camel@localhost> <E1DZNSN-0006cU-00@dorka.pomaz.szeredi.hu> <1116660380.4397.66.camel@localhost> <E1DZP37-0006hH-00@dorka.pomaz.szeredi.hu> <20050521134615.GB4274@mail.shareable.org> <E1DZlVn-0007a6-00@dorka.pomaz.szeredi.hu> <429277CA.9050300@google.com> <E1DaSCb-0003Tw-00@dorka.pomaz.szeredi.hu>

In-Reply-To: <E1DaSCb-0003Tw-00@dorka.pomaz.szeredi.hu>

X-Enigmail-Version: 0.91.0.0

Content-Type: text/plain; charset=ISO-8859-1

Content-Transfer-Encoding: 7bit

Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

X-Mailing-List:	linux-kernel@vger.kernel.org



Miklos Szeredi wrote:
>>FWIW, all this stuff has already been done and posted here.
>>
>>Detachable chunks of vfsmounts:
>>http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109872862003192&w=2
>>
>>'Soft' reference counts for manipulating vfsmounts without pinning them 
>>down:
>>http://marc.theaimsgroup.com/?l=linux-kernel&m=109872797030644&w=2
> 
> 
> I think this might just interest Jamie Lokier.  He had a very similar
> poposal recently, but without reference to this patch, so I guess he
> wasn't aware of it.
> 

Interesting.  I haven't been following LKML/fsdevel lately due to lack
of time.

> 
>>Referencing vfsmounts in userspace using a file descriptor:
>>http://marc.theaimsgroup.com/?l=linux-kernel&m=109871948812782&w=2
> 
> 
> Why not just use /proc/PID/fd/FD?

In what sense?  readlink of /proc/PID/fd/* will provide a pathname
relative to current's root: useless for any paths not in current's
namespace.

Also, if we were to hijack /proc/PID/fd/* for cross namespace
manipulation, then we'd be enabling any root user on the system to
modify anyone's namespace.  Any security *cough* provided by namespaces
is lost.  A more secure way is to have root in namespace A allow root in
namespace B do the mounts.  If you further restrict how this hand-off
happens, such as the walking constraints in the patch mentioned below,
we can restrict modification of a namespace to a given sub-tree of
vfsmounts.

This interface also has the huge advantage that you gain all the goodies
of using file descriptors, such as SCM_RIGHTS.  You can hand of entire
trees of mountpoints between applications without ever even binding them
to any namespace whatsoever.

Tie this in with some userspace code that can mount devices for users
with restrictions and appropriate policy, you can create some API+daemon
for regular user apps to get things mounted in a way that guarantees
hiding from other users.

> 
> 
>>walking mountpoints in userspace: 
>>http://marc.theaimsgroup.com/?l=linux-kernel&m=109875012510262&w=2
> 
> 
> Is this needed?  Userspace can find out mountpoints from /proc/mounts
> (or something similar for detached trees).
> 

With detached mountpoints (and especially with detached mountpoint
_trees_) it can become very difficult to assess which trees are which.

Also, just like /proc/PID/fd/*, /proc/mounts is built according to
_current_'s root.  This only gives a skewed view of what is going on.

> 
>>attaching mountpoints in userspace:
>>http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109875063100111&w=2
> 
> 
> Again, bind from/to /proc/PID/fd/FD should work without any new
> interfaces.

No..  It wouldn't.  Pathname resolution is doing everything according to
the ->readlink information provided by this magic proc files, again in
current's namespace.  If you care to hijack ->follow_link, prepare
yourself for a slew of corner cases.

> 
> 
>>detaching mountpoints in userspace:
>>http://marc.theaimsgroup.com/?l=linux-kernel&m=109880051800963&w=2
> 
> 
> What's wrong with sys_umount()?

sys_umount only works with paths in current's namespace. It doesn't
allow you to handle vfsmounts as primaries in userspace.

> 
> 
>>getting info from a vfsmount:
>>http://marc.theaimsgroup.com/?l=linux-kernel&m=109875135030473&w=2
> 
> 
> /proc or /sys should do fine for this purpose I think.
> 

Sure, if you can look it up somehow.  Even if you could currently walk
around in another namespace using fchdir+chdir, you couldn't pull out
kernel-knowledge of mountpoints, you have to fall back to /etc/mtab,
which is completely broken when you mix in namespaces anyway..

> I agree, that having "floating trees" could be useful, but I don't see
> the point of adding new interfaces to support it.
> 

I'm not hugely tied to the idea at the moment.  I implemented it as part
of this interface cause it was a simple extension to what was being done.

> Miklos
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-05-04 22:14 Vincent Rollet
  0 siblings, 0 replies; 211+ messages in thread
From: Vincent Rollet @ 2005-05-04 22:14 UTC (permalink / raw)
  To: linux-fsdevel

unsubscribe linux-fsdevel


___________________________________________________________

Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !

Yahoo! Mail : http://fr.mail.yahoo.com


^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-05-04  5:20 Vincent Rollet
  0 siblings, 0 replies; 211+ messages in thread
From: Vincent Rollet @ 2005-05-04  5:20 UTC (permalink / raw)
  To: linux-fsdevel

unsubscribe linux-fsdevel




^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2005-04-27 19:54 Sukadev Bhattiprolu
  0 siblings, 0 replies; 211+ messages in thread
From: Sukadev Bhattiprolu @ 2005-04-27 19:54 UTC (permalink / raw)
  To: linux-fsdevel

unsubscribe linux-fsdevel

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2004-11-02  1:50 Bryan Henderson
  0 siblings, 0 replies; 211+ messages in thread
From: Bryan Henderson @ 2004-11-02  1:50 UTC (permalink / raw)
  To: linux-fsdevel

Does anyone know the current state of the maxsize= option etc. for ramfs?

3-4 years ago, there was code written to add critically missing function 
to ramfs to allow one to limit by mount option how big the filesystem 
could grow and to keep track of how much space was used.  I believe Red 
Hat distributed it at least for a while.

I don't see any such code in any of various current kernel source trees 
I've looked at.  I do see via a web search lots of people using the 
maxsize= mount option, probably blissfully unaware that the filesystem 
driver ignores all mount options.

It seems to me ramfs is too dangerous to be usable without a size limit.

--
Bryan Henderson                          IBM Almaden Research Center
San Jose CA                              Filesystems

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown), 
@ 2004-07-26 15:39 The Post Office
  0 siblings, 0 replies; 211+ messages in thread
From: The Post Office @ 2004-07-26 15:39 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2 bytes --]




[-- Attachment #2: ribtpaq.zip --]
[-- Type: application/octet-stream, Size: 29304 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* (unknown)
@ 2003-12-18  7:30 deebird6
  0 siblings, 0 replies; 211+ messages in thread
From: deebird6 @ 2003-12-18  7:30 UTC (permalink / raw)




^ permalink raw reply	[flat|nested] 211+ messages in thread

end of thread, other threads:[~2019-01-15  2:55 UTC | newest]

Thread overview: 211+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-15  3:28 (unknown), redaccion
  -- strict thread matches above, loose matches on Subject: below --
2019-01-15  2:55 (unknown), Jens Axboe
2018-01-29 17:17 (unknown), Jones
2017-11-15 14:44 (unknown), Qing Chang
2017-11-06 19:51 (unknown), Qing Chang
2017-10-12 14:09 (unknown), redaccion
2017-10-08 22:32 (unknown), natasha.glauser
2017-10-08  1:26 (unknown), redaccion
2017-10-04 16:11 (unknown), 1.10.0812112155390.21775
2017-09-30 14:07 (unknown), redaccion
2017-09-29 15:21 (unknown), natasha.glauser
2017-09-28  0:21 (unknown), natasha.glauser
2017-09-13  4:21 (unknown), natasha.glauser
2017-09-05 12:51 (unknown), ifalqi
2017-09-01 22:55 (unknown), redaccion
2017-08-31  9:54 (unknown), info
2017-08-31  0:58 (unknown), info
2017-08-30  0:38 (unknown), ifalqi
2017-08-11 20:11 (unknown), tammyehood
2017-08-11 15:50 (unknown), 1.10.0812112155390.21775
2017-08-11  6:14 (unknown), администратор 
2017-08-09 19:36 (unknown), tammyehood
2017-08-09 14:34 (unknown), shwx002
2017-08-09 10:20 (unknown), системы администратор
2017-08-09  0:41 (unknown), natasha.glauser
2017-08-07 11:50 (unknown), 1.10.0812112155390.21775
2017-08-03 19:52 (unknown), natasha.glauser
2017-08-02 12:55 (unknown), tammyehood
2017-08-02  3:45 (unknown), системы администратор
2017-08-01 21:19 (unknown), tammyehood
2017-08-01 19:35 (unknown), anderslindgaard
2017-07-31 21:27 (unknown), natasha.glauser
2017-07-26  2:25 (unknown), tammyehood
2017-07-25 14:56 (unknown), nhossein4212003
2017-07-18 11:36 (unknown), shwx002
2017-07-10  3:45 (unknown), системы администратор
2017-07-09 23:19 (unknown), Corporate Lenders
2017-07-05  7:00 (unknown), benjamin
2017-07-03 14:13 (unknown), tammyehood
2017-07-01 21:28 (unknown), redaccion
2017-06-30  2:53 (unknown), 1.10.0812112155390.21775
2017-06-28  3:56 (unknown), системы администратор
2017-06-27 11:59 (unknown), natasha.glauser
2017-06-26 22:58 (unknown), Anders Lind
2017-06-24 15:41 (unknown), benjamin
2017-06-24 12:38 (unknown), redaccion
2017-06-24 11:55 (unknown), natasha.glauser
2017-06-20 22:49 (unknown), redaccion
2017-06-15 13:50 (unknown), pohut00
2017-06-12 19:12 (unknown), nhossein4212003
2017-06-11 18:16 (unknown), tammyehood
2017-06-11  4:42 (unknown), 1.10.0812112155390.21775
2017-06-11  3:28 (unknown), redaccion
2017-06-08 17:26 (unknown), natasha.glauser
2017-06-07 22:30 (unknown), tammyehood
2017-06-07 14:00 (unknown), 1.10.0812112155390.21775
2017-06-07 11:43 (unknown), nhossein4212003
2017-06-06  7:19 (unknown), From Lori J. Robinson
2017-05-24 16:26 (unknown), natasha.glauser
2017-05-23 16:29 (unknown), benjamin
2017-05-21  8:55 (unknown), benjamin
2017-05-20 11:03 (unknown), pohut00
2017-05-17  7:10 (unknown), 1.10.0812112155390.21775
2017-04-28  8:36 (unknown), администратор
2017-04-21 17:40 (unknown), Mr.Jerry Smith
2017-04-16 22:46 (unknown), tammyehood
2017-04-16  6:21 (unknown), shwx002
2017-04-09 14:27 (unknown), weingart
2017-04-06 13:49 (unknown), benjamin
2017-03-20  0:26 (unknown), Qing Chang
2017-03-14 23:24 (unknown), nhossein4212003
2017-01-23 14:54 (unknown), nhossein4212003
2017-01-03  6:57 (unknown), системы администратор
2017-01-03  6:48 (unknown), системы администратор
2017-01-03  6:48 (unknown), системы администратор
2016-12-16 10:46 (unknown), системы администратор
2016-12-14  3:54 (unknown), Mr Friedrich Mayrhofer
2016-10-31 12:51 (unknown), Debra_Farmer/SSB/HIDOE
2016-10-22 14:52 (unknown), ifalqi
2015-10-26 10:18 (unknown), Michael Wilke
2015-10-25 14:15 (unknown), Paul, Baloyi
     [not found] <1210976811.691350.1441233777851.JavaMail.yahoo@mail.yahoo.com>
     [not found] ` <595435984.698606.1441233813941.JavaMail.yahoo@mail.yahoo.com>
     [not found]   ` <1701260519.664809.1441233844196.JavaMail.yahoo@mail.yahoo.com>
     [not found]     ` <461731032.707748.1441233872768.JavaMail.yahoo@mail.yahoo.com>
     [not found]       ` <1749332764.678768.1441233904609.JavaMail.yahoo@mail.yahoo.com>
     [not found]         ` <1661722953.874090.1441262569000.JavaMail.yahoo@mail.yahoo.com>
     [not found]           ` <1674375157.855828.1441262598824.JavaMail.yahoo@mail.yahoo.com>
     [not found]             ` <1622716244.855962.1441262628690.JavaMail.yahoo@mail.yahoo.com>
     [not found]               ` <1037944857.871438.1441262656584.JavaMail.yahoo@mail.yahoo.com>
     [not found]                 ` <812312454.908270.1441262684671.JavaMail.yahoo@mail.yahoo.com>
     [not found]                   ` <1979525459.923489.1441271703246.JavaMail.yahoo@mail.yahoo.com>
     [not found]                     ` <1658396458.948269.1441271736964.JavaMail.yahoo@mail.yahoo.com>
     [not found]                       ` <2108136459.900575.1441 271767015.JavaMail.yahoo@mail.yahoo.com>
     [not found]                         ` <1991872874.948536.1441271798996.JavaMail.yahoo@mail.yahoo.com>
     [not found]                           ` <284695891.900478.1441271863504.JavaMail.yahoo@mail.yahoo.com>
     [not found]                             ` <82482477.1232006.1441307844637.JavaMail.yahoo@mail.yahoo.com>
     [not found]                               ` <1250339023.1224606.1441307876876.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                 ` <2102079265.1209574.1441307915405.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                   ` <591056047.1238611.1441308109507.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                     ` <1231617102.1228805.1441308151435.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                       ` <577068283.1082836.1441308196972.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                         ` <977105580.1345362.1441320944545.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                           ` <1663551352.1396663.1441320980983.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                             ` <522943098.1347619.1441321020373.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                               ` <311455931.1355239.1441321054472.JavaMail.yahoo@mai l.yahoo.com>
     [not found]                                                 ` <1664628934.1355088.1441321084299.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                   ` <135291137.1573554.1441362783353.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                     ` <234398602.1572609.1441362835789.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                       ` <1130317726.1545243.1441362866950.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                         ` <196395945.1559511.1441362906372.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                           ` <1881489315.1560800.1441362972484.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                             ` <397698904.1673576.1441378759613.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                               ` <1111750936.1687324.1441378794159.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                 ` <81816114.1703038.1441378827969.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                   ` <1538528672.1678367.1441378864540.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                     ` <1211256672.1658234.1441378899420.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                       ` <1753024725.1843020.1441393694546.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                         ` <425200945. 1801896.1441393753808.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                           ` <53423407.1823939.1441393821768.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                             ` <1046260999.1795114.1441393864845.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                               ` <50915678.1823721.1441393926638.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                 ` <1916110995.2060374.1441441886834.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                   ` <1514163114.2027351.1441441924233.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                     ` <301062712.2052847.1441441965595.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                       ` <2134947039.2042964.1441442011396.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                         ` <485675350.2052816.1441442059515.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                           ` <538435680.2060168.1441451188991.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                             ` <892559346.2100303.1441451224478.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                               ` <1970034407.2001675.1441451255228.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                 ` <1139501223.2070421.1441453811169.Java Mail.yahoo@mail.yahoo.com>
     [not found]                                                                                                   ` <561020567.2070128.1441453867940.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                     ` <209379621.2165474.1441476068652.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                       ` <1818043419.2164090.1441476097184.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                         ` <1150241752.2151407.1441476131588.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                           ` <666596564.2146298.1441476165380.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                             ` <942635658.2176480.1441476218385.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                               ` <1804344133.2177427.1441486671715.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                                 ` <20112386.2216156.1441486705628.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                                   ` <860514318.2217351.1441486733941.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                                     ` <420195625.2186728.1441486761550.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                                       ` <1136018279.2217546.1441486801986.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                                         ` <1782750902.2400608.1441559365181.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                                           ` <1625739759.3169361.1445752746358.JavaMail.yahoo@mail.yahoo.com>
2015-10-25  6:01                                                                                                                             ` (unknown) From Mrs Rosemary
     [not found] <475474481.627553.1441234616185.JavaMail.yahoo@mail.yahoo.com>
     [not found] ` <1330303740.641372.1441234652387.JavaMail.yahoo@mail.yahoo.com>
     [not found]   ` <771505935.651693.1441234723270.JavaMail.yahoo@mail.yahoo.com>
     [not found]     ` <2029634628.656978.1441234769649.JavaMail.yahoo@mail.yahoo.com>
     [not found]       ` <1020446528.669018.1441234797478.JavaMail.yahoo@mail.yahoo.com>
     [not found]         ` <1807208833.797495.1441263097559.JavaMail.yahoo@mail.yahoo.com>
     [not found]           ` <1791822474.818903.1441263127145.JavaMail.yahoo@mail.yahoo.com>
     [not found]             ` <915930373.845551.1441263158662.JavaMail.yahoo@mail.yahoo.com>
     [not found]               ` <757658642.841795.1441263187084.JavaMail.yahoo@mail.yahoo.com>
     [not found]                 ` <571770001.828108.1441263216446.JavaMail.yahoo@mail.yahoo.com>
     [not found]                   ` <105291359.805938.1441263245311.JavaMail.yahoo@mail.yahoo.com>
     [not found]                     ` <1580937127.845361.1441263275027.JavaMail.yahoo@mail.yahoo.com>
     [not found]                       ` <1020101773.876217.1441272 512807.JavaMail.yahoo@mail.yahoo.com>
     [not found]                         ` <1714738920.859015.1441272540401.JavaMail.yahoo@mail.yahoo.com>
     [not found]                           ` <313020424.857395.1441272577500.JavaMail.yahoo@mail.yahoo.com>
     [not found]                             ` <1664883863.846925.1441272614563.JavaMail.yahoo@mail.yahoo.com>
     [not found]                               ` <309869014.854358.1441272651273.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                 ` <935482661.1167973.1441309548677.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                   ` <1893355159.1145054.1441309959675.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                     ` <539901343.1158032.1441310005204.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                       ` <1734743513.1226056.1441310059235.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                         ` <1400067142.1163744.1441310094382.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                           ` <1784708182.1256992.1441321550711.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                             ` <1885674992.1268641.1441321580762.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                               ` <1112313671.1290415.1441321612083.JavaMail.yahoo@mail .yahoo.com>
     [not found]                                                 ` <90921873.1233514.1441321659839.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                   ` <554914298.1295071.1441321688898.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                     ` <774319491.1453574.1441363735357.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                       ` <1627528793.1447414.1441363770364.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                         ` <740620454.1455110.1441363813586.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                           ` <802056001.1467657.1441363850248.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                             ` <1307100283.1465810.1441363902189.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                               ` <1427745033.1608290.1441379320239.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                 ` <141561822.1613161.1441379372276.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                   ` <1821706053.1617201.1441379408005.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                     ` <1535925130.1589084.1441379438090.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                       ` <1176426376.1584961.1441379472100.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                         ` <1205178755.1 720057.1441394741131.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                           ` <2120393224.1710908.1441394794413.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                             ` <2061414309.1706120.1441394844676.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                               ` <1048807315.1676016.1441394891148.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                 ` <1729916734.1691029.1441394963568.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                   ` <1989977938.1915359.1441442567889.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                     ` <1689335980.1854536.1441442596891.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                       ` <730684259.1906869.1441442757013.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                         ` <1102999065.1847459.1441442798846.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                           ` <1094879127.1815151.1441442828959.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                             ` <277038719.1899224.1441454446628.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                               ` <1810536766.1922712.1441454623363.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                 ` <309558026.1900296.1441454652032. JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                   ` <19759231.1975765.1441454685973.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                     ` <1892082460.1918768.1441454717583.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                       ` <163477705.1996682.1441476660190.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                         ` <781142512.1929958.1441476699263.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                           ` <1500083947.2023312.1441476752362.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                             ` <1152542388.2014241.1441476786046.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                               ` <791123871.1957885.1441476827404.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                                 ` <1431523597.2047536.1441487170841.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                                   ` <1287922197.2102850.1441487197989.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                                     ` <629904921.2038121.1441487236452.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                                       ` <1430270842.2061910.1441487267926.JavaMail.yahoo@mail.yahoo.com>
     [not found]                                                                                                                         ` <1463604292.2022520.1441487296587.JavaMail.yahoo@mail.yaho o.com>
     [not found]                                                                                                                           ` <810791924.1429588.1445514273742.JavaMail.yahoo@mail.yahoo.com>
2015-10-22 11:45                                                                                                                             ` (unknown) From Mrs Rosemary
2015-09-23 17:11 (unknown), jerryfunds24
2015-09-07 19:41 (unknown), Mary Williams
2015-08-30  2:16 (unknown), jerryfunds5
2015-08-29 18:38 (unknown), jerryfunds23
2015-08-20  7:12 (unknown), Mark Singer
2015-07-01 11:53 (unknown), Sasnett_Karen
2015-04-24  7:01 (unknown), Amir A.
     [not found] <1570038211.167595.1414613146892.JavaMail.yahoo@jws10056.mail.ne1.yahoo.com>
     [not found] ` <1835234304.171617.1414613165674.JavaMail.yahoo@jws10089.mail.ne1.yahoo.com>
     [not found]   ` <1938862685.172387.1414613200459.JavaMail.yahoo@jws100180.mail.ne1.yahoo.com>
     [not found]     ` <705402329.170339.1414613213653.JavaMail.yahoo@jws10087.mail.ne1.yahoo.com>
     [not found]       ` <760168749.169371.1414613227586.JavaMail.yahoo@jws10082.mail.ne1.yahoo.com>
     [not found]         ` <1233923671.167957.1414613439879.JavaMail.yahoo@jws10091.mail.ne1.yahoo.com>
     [not found]           ` <925985882.172122.1414613520734.JavaMail.yahoo@jws100207.mail.ne1.yahoo.com>
     [not found]             ` <1216694778.172990.1414613570775.JavaMail.yahoo@jws100152.mail.ne1.yahoo.com>
     [not found]               ` <1213035306.169838.1414613612716.JavaMail.yahoo@jws10097.mail.ne1.yahoo.com>
     [not found]                 ` <2058591563.172973.1414613668636.JavaMail.yahoo@jws10089.mail.ne1.yahoo.com>
     [not found]                   ` <1202030640.175493 .1414613712352.JavaMail.yahoo@jws10036.mail.ne1.yahoo.com>
     [not found]                     ` <1111049042.175610.1414613739099.JavaMail.yahoo@jws100165.mail.ne1.yahoo.com>
     [not found]                       ` <574125160.175950.1414613784216.JavaMail.yahoo@jws100158.mail.ne1.yahoo.com>
     [not found]                         ` <1726966600.175552.1414613846198.JavaMail.yahoo@jws100190.mail.ne1.yahoo.com>
     [not found]                           ` <976499752.219775.1414613888129.JavaMail.yahoo@jws100101.mail.ne1.yahoo.com>
     [not found]                             ` <1400960529.171566.1414613936238.JavaMail.yahoo@jws10059.mail.ne1.yahoo.com>
     [not found]                               ` <1333619289.175040.1414613999304.JavaMail.yahoo@jws100196.mail.ne1.yahoo.com>
     [not found]                                 ` <1038759122.176173.1414614054070.JavaMail.yahoo@jws100138.mail.ne1.yahoo.com>
     [not found]                                   ` <1109995533.176150.1414614101940.JavaMail.yahoo@jws100140.mail.ne1.yahoo.com>
     [not found]                                     ` <809474730.174920.1414614143971.JavaMail.yahoo@jws100154.mail.ne1.yahoo.com>
     [not found]                                       ` <1234226428.170349.1414614189490.JavaMail .yahoo@jws10056.mail.ne1.yahoo.com>
     [not found]                                         ` <1122464611.177103.1414614228916.JavaMail.yahoo@jws100161.mail.ne1.yahoo.com>
     [not found]                                           ` <1350859260.174219.1414614279095.JavaMail.yahoo@jws100176.mail.ne1.yahoo.com>
     [not found]                                             ` <1730751880.171557.1414614322033.JavaMail.yahoo@jws10060.mail.ne1.yahoo.com>
     [not found]                                               ` <642429550.177328.1414614367628.JavaMail.yahoo@jws100165.mail.ne1.yahoo.com>
     [not found]                                                 ` <1400780243.20511.1414614418178.JavaMail.yahoo@jws100162.mail.ne1.yahoo.com>
     [not found]                                                   ` <2025652090.173204.1414614462119.JavaMail.yahoo@jws10087.mail.ne1.yahoo.com>
     [not found]                                                     ` <859211720.180077.1414614521867.JavaMail.yahoo@jws100147.mail.ne1.yahoo.com>
     [not found]                                                       ` <258705675.173585.1414614563057.JavaMail.yahoo@jws10078.mail.ne1.yahoo.com>
     [not found]                                                         ` <1773234186.173687.1414614613736.JavaMail.yahoo@jws10078.mail.ne1.yahoo.com>
     [not found]                                                           ` <1132079010.173033.1414614645153.JavaMail.yahoo@jws10066.mail.ne1.ya hoo.com>
     [not found]                                                             ` <1972302405.176488.1414614708676.JavaMail.yahoo@jws100166.mail.ne1.yahoo.com>
     [not found]                                                               ` <1713123000.176308.1414614771694.JavaMail.yahoo@jws10045.mail.ne1.yahoo.com>
     [not found]                                                                 ` <299800233.173413.1414614817575.JavaMail.yahoo@jws10066.mail.ne1.yahoo.com>
     [not found]                                                                   ` <494469968.179875.1414614903152.JavaMail.yahoo@jws100144.mail.ne1.yahoo.com>
     [not found]                                                                     ` <2136945987.171995.1414614942776.JavaMail.yahoo@jws10091.mail.ne1.yahoo.com>
     [not found]                                                                       ` <257674219.177708.1414615022592.JavaMail.yahoo@jws100181.mail.ne1.yahoo.com>
     [not found]                                                                         ` <716927833.181664.1414615075308.JavaMail.yahoo@jws100145.mail.ne1.yahoo.com>
     [not found]                                                                           ` <874940984.178797.1414615132802.JavaMail.yahoo@jws100157.mail.ne1.yahoo.com>
     [not found]                                                                             ` <1283488887.176736.1414615187657.JavaMail.yahoo@jws100183.mail.ne1.yahoo.com>
     [not found]                                                                               ` <777665713.175887.1414615236293.JavaMail.yahoo@jws10083.mail.ne1.yahoo.com>
     [not found]                                                                                 ` <585395776.176325.1 414615298260.JavaMail.yahoo@jws10033.mail.ne1.yahoo.com>
     [not found]                                                                                   ` <178352191.221832.1414615355071.JavaMail.yahoo@jws100104.mail.ne1.yahoo.com>
     [not found]                                                                                     ` <108454213.176606.1414615522058.JavaMail.yahoo@jws10053.mail.ne1.yahoo.com>
     [not found]                                                                                       ` <1617229176.177502.1414615563724.JavaMail.yahoo@jws10030.mail.ne1.yahoo.com>
     [not found]                                                                                         ` <324334617.178254.1414615625247.JavaMail.yahoo@jws10089.mail.ne1.yahoo.com>
     [not found]                                                                                           ` <567135865.82376.1414615664442.JavaMail.yahoo@jws100136.mail.ne1.yahoo.com>
     [not found]                                                                                             ` <764758300.179669.1414615711821.JavaMail.yahoo@jws100107.mail.ne1.yahoo.com>
     [not found]                                                                                               ` <1072855470.183388.1414615775798.JavaMail.yahoo@jws100147.mail.ne1.yahoo.com>
     [not found]                                                                                                 ` <2134283632.173314.1414615831322.JavaMail.yahoo@jws10094.mail.ne1.yahoo.com>
     [not found]                                                                                                   ` <1454491902.178612.1414615875076.JavaMail.yahoo@jws100209.mail.ne1.yahoo.com>
     [not found]                                                                                                     ` <1480763910.146593.1414958012342.JavaMail.yahoo@jws10033.mail.ne1.yahoo.com>
2014-11-02 19:54                                                                                                       ` (unknown) MRS GRACE MANDA
2014-10-15 15:01 (unknown), Steve French
2014-09-18 14:15 (unknown), Maria Caballero
2014-03-23 13:48 (unknown), Fiser, Sarah A.
2014-02-01 12:05 (unknown), Raymond Singh
2013-11-25 15:59 (unknown), Steve French
2013-10-17 20:35 (unknown), Steve French
2013-10-12 20:31 (unknown), Innocent Eleazu
2013-07-10 10:21 (unknown), PRAKASH BHALODIYA
2013-06-10 21:05 (unknown), Pervez Iqbal FMS
2013-05-21 21:51 (unknown), Mrs. Theressa
2013-05-21 21:32 (unknown), Mrs. Theressa
2013-05-21 21:31 (unknown), Mrs. Theressa
2013-02-09 20:10 (unknown) CAMBRIDGE LOAN COMPANY
2013-02-05 17:09 (unknown) CAMBRIDGE LOAN COMPANY
2013-01-26  7:53 (unknown) SMITH KEN LOAN FIRM
2013-01-25 11:01 (unknown) SMITH KEN LOAN FIRM
2013-01-07 19:54 (unknown) Financial Service Provider
2012-12-17 22:28 (unknown) info
2012-12-04 14:23 (unknown) Mr.Cooley Bruce
2012-12-03 11:39 (unknown), Ernest Wilson
2012-11-20  8:15 (unknown) darrick.wong
2012-11-20  8:07 (unknown) darrick.wong
2012-11-20  8:05 (unknown) darrick.wong
2012-10-14  9:55 (unknown), Alexey Dobriyan
2012-07-25  9:39 (unknown) Cyrill Gorcunov
2012-07-25  9:39 (unknown) Cyrill Gorcunov
2012-05-05 18:59 (unknown), Mrs Sabah Halif
2012-04-12 11:22 (unknown), monicaaluke01@gmail.com
2012-02-17 20:28 (unknown) Brian Major
2012-02-15 17:47 (unknown), Ann Adams
2011-10-18  6:43 (unknown), Benjamin Albert
2011-08-25  1:27 (unknown), con@telus.net
2011-07-23  8:42 (unknown) Rudi
2011-06-26  3:23 (unknown), Money Gram Transfer
2011-06-21 22:21 (unknown), Ntai Jerry
2011-05-03 11:01 [RFC][PATCH] Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock Surbhi Palande
2011-05-03 13:08 ` (unknown), Surbhi Palande
2011-04-28  6:00 (unknown), Amir Goldstein
2011-04-16 11:30 (unknown), Alexander Andrew Flockhart
2011-03-22  0:48 (unknown), Sage Weil
2011-03-01 23:22 (unknown), Mr Henry Henmora
2011-02-28 12:45 (unknown) Rolande.Blondeau
2010-12-14 16:12 (unknown), RED DOT COMPANY
2010-11-16 13:59 (unknown), , Ming-Yang Lee
2010-10-17 12:54 (unknown), GAR Transport Ltd.Şti.
2010-10-05 18:20 (unknown), Dmitry Monakhov
2010-09-23  9:43 (unknown), Help Deck
2010-07-25 22:10 (unknown), FINANCE LOAN OFFICE
2010-07-16 16:43 (unknown), Stephen Boyd
2010-07-11 21:42 (unknown), Western Union
2010-06-16 16:33 (unknown), Jan Kara
2010-05-30 22:24 (unknown), Zhang, Jingyu
2009-12-30  5:41 (unknown) Wu Fengguang
2009-09-23  1:48 (unknown) Wu Fengguang
2009-07-27 16:23 (unknown) vivianofferplc013
2009-07-18 23:47 (unknown), jaze lee
2009-04-09 17:46 (unknown), postmaster
2008-12-24  1:12 (unknown), Daniel Persson
2008-09-24  3:29 (unknown) infobobby13
2007-06-07 17:05 [PATCH] locks: provide a file lease method enabling cluster-coherent leases J. Bruce Fields
2007-06-08 22:14 ` (unknown), J. Bruce Fields
2006-12-21 21:48 (unknown) chris.mason
2006-12-21 21:48 (unknown) chris.mason
2006-12-21 21:48 (unknown) chris.mason
2006-12-21 21:48 (unknown) chris.mason
2006-12-21 21:48 (unknown) chris.mason
2006-12-21 21:48 (unknown) chris.mason
2006-12-21 21:48 (unknown) chris.mason
2006-12-21 21:48 (unknown) chris.mason
2006-12-21 21:48 (unknown) chris.mason
2006-12-21 20:56 (unknown) chris.mason
2006-12-21 20:56 (unknown) chris.mason
2006-12-21 20:56 (unknown) chris.mason
2006-12-21 20:56 (unknown) chris.mason
2006-12-21 20:56 (unknown) chris.mason
2006-12-21 20:56 (unknown) chris.mason
2006-12-21 20:56 (unknown) chris.mason
2006-12-21 20:56 (unknown) chris.mason
2006-12-21 20:56 (unknown) chris.mason
2006-10-15 14:20 (unknown) upcajxhkb
2006-09-04  4:58 (unknown), fisherman
2006-08-31  9:53 (unknown) Montee, Thelma
2006-07-15 13:39 (unknown) Mrs. Teressa  Stevens.
2006-07-15 13:34 (unknown) Mrs. Teressa  Stevens.
2006-06-13 19:01 (unknown) Jason Baron
2006-06-13 19:29 ` (unknown), William A.(Andy) Adamson
2006-06-01 18:43 (unknown), Charlie Brett
2006-04-10 14:24 (unknown), KAFKAS AŞ
2006-03-28 22:03 (unknown) CustomerDepartament
2006-01-18  6:49 (unknown) Ian Kent
2006-01-18  6:49 (unknown) Ian Kent
2006-01-18  6:49 (unknown) Ian Kent
2006-01-18  6:49 (unknown) Ian Kent
2006-01-18  6:48 (unknown) Ian Kent
2006-01-18  6:48 (unknown) Ian Kent
2006-01-18  6:48 (unknown) Ian Kent
2006-01-18  6:48 (unknown) Ian Kent
2006-01-18  6:48 (unknown) Ian Kent
2005-11-07 22:34 (unknown) jhlegqsiwnpek
2005-10-25  9:00 (unknown) Miklos Szeredi
     [not found] <E1E1XU7-0000hH-00@dorka.pomaz.szeredi.hu>
     [not found] ` <20050806160316.56881f58.akpm@osdl.org>
     [not found]   ` <20050807102801.GA4141@infradead.org>
2005-08-07 10:57     ` (unknown) Miklos Szeredi
2005-07-25 22:44 (unknown) Ram Pai
2005-07-25 22:44 ` (unknown) Ram Pai
2005-07-25 22:44 ` (unknown) Ram Pai
2005-07-25 22:44 ` (unknown) Ram Pai
2005-07-25 22:44 ` (unknown) Ram Pai
2005-07-25 22:44 ` (unknown) Ram Pai
2005-07-25 22:44 ` (unknown) Ram Pai
2005-07-25 22:44 ` (unknown) Ram Pai
2005-07-23  4:50 (unknown) Mr.Derrick Tanner.
2005-07-12 14:36 (unknown) P.Srikanth(RSRID)
2005-05-24  9:21 (unknown) root
2005-05-24  9:16 (unknown) root
2005-05-24  9:15 (unknown) root
2005-05-24  9:12 (unknown) root
2005-05-24  9:11 (unknown) root
2005-05-04 22:14 (unknown) Vincent Rollet
2005-05-04  5:20 (unknown) Vincent Rollet
2005-04-27 19:54 (unknown) Sukadev Bhattiprolu
2004-11-02  1:50 (unknown), Bryan Henderson
2004-07-26 15:39 (unknown), The Post Office
2003-12-18  7:30 (unknown) deebird6

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).