Discussion:
Simultaneously mounting one XFS partition on multiple machines
(too old to reply)
Patrick J. LoPresti
2011-01-04 17:46:39 UTC
Permalink
Hey, what's the worst that could happen?

I recently learned that some of my colleagues have configured two
Linux systems to simultaneously mount a single XFS partition residing
on shared storage. Specifically, "system R" has the partition mounted
read-only while "system W" has it mounted read/write.

I told them that this sounds like a very bad idea because XFS is not a
clustered file system. But they are skeptical because "it seems to be
working fine". I need to know what the actual risks are and whether
they can be mitigated.

This partition holds large amounts of essentially archival data; that
is, it is read frequently but written rarely. When they do want to
write to it, they do so via system W and then reboot system R.

I am no expert on XFS, but there are essentially two risks that I can see:

Risk 1: When making changes via system W, the view of the file system
from system R can become corrupted or inconsistent. My colleagues are
aware of this and believe they can live with it, as long as the
underlying file system is not being damaged ("we can just reboot").

Risk 2: Any time the file system is mounted, even read-only, it will
replay the journal if it is non-empty. (At least, I believe this is
true. Could one of you please confirm or deny?) So if machine R
should reboot while the journal is non-empty, it will replay it,
causing fairly unpredictable on-disk corruption.


Here are my questions.

1) When can a read-only XFS mount write to the disk, exactly?

2) If I do a "sync" on machine W (and perform no further writes), will
that truncate the journal?

3) What am I missing?


If your answer is "Please do not do this; get a clustered filesystem",
then trust me, you are preaching to the choir. But these systems are
already in use and unlikely to be replaced soon, so at this point my
job is to find out what the exact risks are. Any information will be
appreciated.

Thanks!

- Pat
Dave Chinner
2011-01-04 21:53:06 UTC
Permalink
Post by Patrick J. LoPresti
Hey, what's the worst that could happen?
That's just asking for trouble. ;)
Post by Patrick J. LoPresti
I recently learned that some of my colleagues have configured two
Linux systems to simultaneously mount a single XFS partition residing
on shared storage. Specifically, "system R" has the partition mounted
read-only while "system W" has it mounted read/write.
I told them that this sounds like a very bad idea because XFS is not a
clustered file system. But they are skeptical because "it seems to be
working fine". I need to know what the actual risks are and whether
they can be mitigated.
Ok, so it will appear to work fine most of the time...
Post by Patrick J. LoPresti
This partition holds large amounts of essentially archival data; that
is, it is read frequently but written rarely. When they do want to
write to it, they do so via system W and then reboot system R.
You could probably just run "echo 3 > /proc/sys/vm/drop_caches" or
just umount/mount the device again to get the same effect as
rebooting.
Post by Patrick J. LoPresti
Risk 1: When making changes via system W, the view of the file system
from system R can become corrupted or inconsistent. My colleagues are
aware of this and believe they can live with it, as long as the
underlying file system is not being damaged ("we can just reboot").
Yup, so long as system R does not cache anything, or the caches are
dropped after system W writes, you should be fine. However, there is a
window between system W starting to write and system R
being rebooted that system R could read inconsistent metadata and/or
data. There's not much you can do about that apart from take system
R offline while system W is writing.
Post by Patrick J. LoPresti
Risk 2: Any time the file system is mounted, even read-only, it will
replay the journal if it is non-empty. (At least, I believe this is
true. Could one of you please confirm or deny?) So if machine R
should reboot while the journal is non-empty, it will replay it,
causing fairly unpredictable on-disk corruption.
Yup.
Post by Patrick J. LoPresti
Here are my questions.
1) When can a read-only XFS mount write to the disk, exactly?
Log recovery only. Use mount -o ro,norecovery to avoid that.
Post by Patrick J. LoPresti
2) If I do a "sync" on machine W (and perform no further writes), will
that truncate the journal?
FYI, the journal cannot be truncated - it is a fixed size circular
log.

To get the log clean, I'd freeze the filesystem on system W while
system R mounts. e.g:

system W system R
unmount <fs>
write data
freeze fs
mount -o ro,norecovery <fs>
unfreeze fs
Post by Patrick J. LoPresti
3) What am I missing?
1. NFS/CIFS. No need for shared access to the block device. NFs
works pretty well for read only access, especially if you put a
dedicated 10GbE link between the two machines...

2. Snapshots. If you must share the block device, snapshot the
active filesystem and mount that readonly on system R - the snapshot
will be unchanging. When system W knows a snapshot is unmounted and
finished with, it can delete it. That is:


system W system R
write data
....
write data
snapshot <fs.ss2>
umount <fs.ss1>
mount -o ro,norecovery <fs.ss2>
delete snapshot <fs.ss1>
....
write data
.....
write data
snapshot <fs.ss3>
umount <fs.ss2>
mount -o ro,norecovery <fs.ss3>
delete snapshot <fs.ss2>
....

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Emmanuel Florac
2011-01-04 21:52:29 UTC
Permalink
Post by Patrick J. LoPresti
If your answer is "Please do not do this; get a clustered filesystem",
then trust me, you are preaching to the choir.
Just as a side note : OCFS2 works well, is present in all major
distros, is extremely fast to setup and install, and is only slightly
slower than xfs. There is absolutely no valid reason not to use it (GFS
OTOH is a complete PITA to setup).

Yeah I know, I know...
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <***@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------
Loading...