Val Henson Guide to Linux File Systems Choosing and tuning the right file system for your workload XFS only Linux FS to support more than 1TB reliably 'iostat' -- useful tool No single best file system, workload-dependent Factor of 10^6 in time between CPU/memory ops and I/O ops -- ns versus ms How FS like to be treated Mostly reads Large, contiguous IO Medium-sized files -- 4K-128K Medium-sized dirs -- 10-1000 entries Most IO near beginning of file Few metadata ops Clean unmount How to abuse your FS Fill one dir with a million files Simultaneously create one huge file with remaining space Randomly create and delete small files in same dir Randomly read and write single bytes of the large file Add and remove ACL/extended attribs Slowly yank the power plug Diffs betw FS's File system and file size Number of inodes Dir size and lookup algorithm File data R/W performance File create/delete performance Space efficiency Special features -- direct IO, execute in place, etc Crash recovery method Ease of repair Stability Support ext2 simple, fast, stable, slow recovery, easy to repair ext3 rock stable, fast recovery, slow metadata ops reiser3 lots of small files, big dirs, less stable, poor repair, less support xfs large files, big dirs, big FS's, slow repair jfs end-of-life'd by IBM others less well tested, poor support Common workloads embedded avoid writing flash unless necessary ext2 (for read-only) / ext3, minix for ramdisks jffs2 for flash without write-balancing (modern flash _has_ write-bal) laptop withstanding frequent crashes low performance demands ext3 is best eliminate writes as much as possible mount -o noatime,nodiratime group writes with laptop mode, read Documentation/laptop-mode.txt desktop sweet spot of most FS's ext3 or reiser reiser notail option improves performance at cost of efficient storage large file working set? increase # of inodes cached in memory Documentation/sysctl/fs.txt file server ext3 for few metadata ops reiser for more metadata ops, small files xfs for large streaming reads/writes, large dirs ext3: data=writeback trades speed for data integrity after a crash faster ext3: data=journal reduces latency of sync NFS writes ext3: default is data=ordered can tweak block size for v high perf, consider ext2 some cluster FS's use ext2 as the per-node base mail server mbox format (one big file) -- ext3 maildir format (lots of small files) -- reiser ext3 w small blocks, high inode-to-file ratio can be good for maildir don't cut any corners on your mail server -- reliability is key database server ocfs2 for cluster oracle databases direct IO often imp tuning FS's for DB's is an arcane art video server large files, write-once, read-many streaming access XFS clear winner ext3 with larger reservation could work NFS tuning tips raise r/w size to ~8192 (8K) use NFSv3 and TCP (not UDP) async raises write perf but could cause probs in crash -- no longer the default Can't recommend NFSv4 yet distributed FS's tradeoff of latency vs consistency most are buggy and slow use optimized for one case NFS - multi read, single write OCFS2 - DB's