From Linux NFS
Section III
ROBUSTNESS TESTING
Basic stability assessments
ID
| test
| tool test
| status
| owner
| notes
|
III.A.1
| Run iozone for 2 weeks on basic client/server operations, using:
- Both data and metadata options
- Cached and direct I/O
- Various mount options
| IOzone
| done
| BULL
| Now testing with fsstress and FFSB
|
III.A.2
| Run automounter use case for 2 weeks on amd, autofs, and autong, using:
- Large number of maps
- Randomly mount and run workloads on an automounted partition
- use a variety of workloads, such as randomly chosen fs tests
| e.g. Crashme more
| New
|
|
|
III.A.3
| Run NFS server for 2 wks with random configuration changes, using:
- Interrupt server in various ways (reboot, power cycle, lan fail)
- Change/reexport export rules at random
- Trigger a client workload at arbitrary times
- Analyze client recovery behaviors
|
| OPEN
| OSDL
|
|
III.A.4
| Run connectathon locking tests against NFS server for 2 weeks, using:
- Multiple client machines
- Reboot at random
- Analyze client cache coherency behaviors
- Analyze locking behaviors
|
| NEW
|
|
|
III.A.5
| Run fsstress 2 weeks on basic client/server operations, using:
- Long list random operations (1000 operations)
- hight number of process (100)
| fsstress
| DONE
| BULL
| 1 week
|
III.A.6
| Run FFSB 1 day on basic client/server operations in stress configuration, using:
- 1 200 000 files
- 100 directories
| ffsb
| DONE
| BULL
| 1 day
|
Resource limit testing
ID
| test
| tool test
| status
| owner
| notes
|
III.B.1
| Test stability of client in out of pid situation
|
|
|
|
|
III.B.2
| Test stability of client in out of memory situation
| valgrind
| New
|
|
|
III.B.3
| Test stability of client in out of disk space on server situation
| dd,fsstress
| Done
| Bull
| Simple error message no space left on device
|
III.B.4
| Test stability of client in out of inode situation
|
|
|
|
|
III.B.5
| Test stability of client in out of swap space situation
|
|
|
|
|
III.B.6
| Test stability of server in out of pid situation
|
|
|
|
|
III.B.7
| Test stability of server in out of memory situation
| valgrind
| New
|
|
|
III.B.8
| Test stability of server in out of disk space
| dd,fsstress
| Done
| Bull
| Simple error message no space left on device
|
III.B.9
| Test stability of server in out of inode situation
|
|
|
|
|
III.B.10
| Test stability of server in out of swap space situation
|
|
|
|
|
Stress load testing
ID
| test
| tool test
| status
| owner
| notes
|
III.C.1
| Run stress tools in a std config on each release
| fsx,fsstress,ffsb
| In progress
| BULL
| fsstress and ffsb are ran 1hour
|
III.C.2
| Analyze load balancing, failure modes, etc. under different stress loads
|
| New
|
|
|
III.C.3
| Destructive testing by measuring point of failure for various loads
|
| New
|
|
|
Scalability (robustness)
ID
| test
| tool test
| status
| owner
| notes
|
III.D.1
| Find maximum number of connections to Linux IA-32 server
| Fsstress, fsx
| New
| Bull (partial)
|
|
III.D.2
| Find maximum number of files for Linux IA-32 exported file system
|
| Open
|
|
|
III.D.3
| Find maximum file size on Linux IA-32
| adhoc tool
| Near done
| Bull
| size is less than 8TB. To be verfied, completed and enhanced.
|
III.D.4
| Find maximum number of mounted file systems on client
| adhoc tool
| Near done
| Bull
| More than 13.000
|
III.D.5
| Test robustness on NUMA when scaling CPU, mem, NIC, or disk count
| Fsstress, fsx
| New
|
|
|
III.D.6
| Test robustness on SMP when scaling CPU, mem, NIC, or disk count
| Fsstress, fsx
| New
| Bull (partial)
|
|
III.D.7
| Test correctness of NFS client when backed by a large (>100GB) cachefs
|
| New
|
|
|
III.D.8
| Find maximum number exported file systems on server
| adhoc tool
| New
| Bull
| No limit reached, but the export process is too slow to be acceptable after 3000 exports
|
III.D.9
| Find maximum size of exported file systems on server
|
| New
|
|
|
Recovery from problems while under light/normal/heavy loads
ID
| test
| tool test
| status
| owner
| notes
|
III.E.1
| Test short & long term local network failure (unplugged cable, ifdown eth0, etc.)
|
| Open
| OSDL
|
|
III.E.2
| Test short & long duration remote network partition
|
| Open
| OSDL
|
|
III.E.3
| Test behavior during crash/reboot of server with clients holding various states
|
| Open
| OSDL
| more
|
III.E.4
| Test multiple clients using, locking, etc. same files
|
| New
|
|
|
III.E.5
| Test behavior of server with failed storage device
|
| New
|
|
|
III.E.6
| Test behavior during crash of client with open delegations and locks
|
| New
|
|
|
III.E.7
| Test recovery from denied permission
|
| New
|
|
|
III.E.8
| Test recovery from JUKEBOX/DELAY
|
| New
|
|
|
III.E.9
| Test recovery from ESTALE
|
| New
|
|
|
Race conditions
ID
| test
| tool test
| status
| owner
| notes
|
III.F.1
| Test for race conditions and locking bugs on PPC64
|
| New
| (Polyserve?)
| Olaf Kirch says PPC64 is good at exposing problems because of its weak CPU cache coherency semantics
|
III.F.
| Test for race conditions on new architectures
|
| New
| (Polyserve?)
| Faster CPU, memory, and buses can expose race conditions
|
Automounter robustness
For more info about Automounter, see notes in nfsv4 list archive for 2/16/05
ID
| test
| tool test
| status
| owner
| notes
|
III.G.1
| Test interuptible automounting in the following cases
- indirect mount
- direct mount
- browsed mount
- multimount offset
|
| New
|
|
|
III.G.2
| Test concurrent access tests for races in automounter
- Have multiple threads working in parallel
|
| New
|
|
|
III.G.3
| Test replicated file system selection
|
| New
|
|
|
III.G.4
| Test remounting after expire corner cases:
- Something (a process) sitting in the scaffolding
- Common case for /net
|
| New
|
| Needs to be supported at nfs level
|