Matrix robustness section
From Linux NFS
(Difference between revisions)
(→Recovery from problems while under light/normal/heavy loads) |
(→Recovery from problems while under light/normal/heavy loads) |
||
Line 328: | Line 328: | ||
|Test recovery from ESTALE | |Test recovery from ESTALE | ||
| | | | ||
+ | |'''New''' | ||
+ | | | ||
+ | | | ||
+ | |} | ||
+ | |||
+ | ==Race conditions== | ||
+ | |||
+ | {|border="1" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% | ||
+ | !style="background: #ececec;"|'''ID | ||
+ | !style="background: #ececec;"|'''test''' | ||
+ | !style="background: #ececec;"|'''tool test''' | ||
+ | !style="background: #ececec;"|'''status''' | ||
+ | !style="background: #ececec;"|'''owner''' | ||
+ | !style="background: #ececec;"|'''notes''' | ||
+ | |- | ||
+ | |III.F.1 | ||
+ | |Test for race conditions and locking bugs on PPC64 | ||
+ | | | ||
+ | |'''New''' | ||
+ | |(Polyserve?) | ||
+ | |Olaf Kirch says PPC64 is good at exposing problems because of its weak CPU cache coherency semantics | ||
+ | |- | ||
+ | |III.F. | ||
+ | |Test for race conditions on new architectures | ||
+ | | | ||
+ | |'''New''' | ||
+ | |(Polyserve?) | ||
+ | |Faster CPU, memory, and buses can expose race conditions | ||
+ | |} | ||
+ | |||
+ | ==Automounter robustness== | ||
+ | |||
+ | For more info about Automounter, see notes in nfsv4 list archive for 2/16/05 | ||
+ | |||
+ | {|border="1" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% | ||
+ | !style="background: #ececec;"|'''ID | ||
+ | !style="background: #ececec;"|'''test''' | ||
+ | !style="background: #ececec;"|'''tool test''' | ||
+ | !style="background: #ececec;"|'''status''' | ||
+ | !style="background: #ececec;"|'''owner''' | ||
+ | !style="background: #ececec;"|'''notes''' | ||
+ | |- | ||
+ | |III.G.1 | ||
+ | |Test interuptible automounting in the following cases | ||
+ | *indirect mount | ||
+ | *direct mount | ||
+ | *browsed mount | ||
+ | *multimount offset | ||
+ | | | ||
+ | |'''New''' | ||
+ | | | ||
+ | |- | ||
+ | |III.G.2 | ||
+ | |Test concurrent access tests for races in automounter | ||
+ | *Have multiple threads working in parallel | ||
+ | | | ||
+ | |'''New''' | ||
+ | | | ||
+ | |- | ||
+ | |III.G.3 | ||
+ | |Test replicated file system selection | ||
+ | |'''New''' | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | |III.G.4 | ||
+ | |Test remounting after expire corner cases: | ||
+ | *Something (a process) sitting in the scaffolding | ||
+ | *Common case for /net | ||
|'''New''' | |'''New''' | ||
| | | | ||
| | | | ||
|} | |} |
Revision as of 13:46, 19 May 2005
Section III
Contents |
ROBUSTNESS TESTING
Basic stability assessments
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.A.1 | Run iozone for 2 weeks on basic client/server operations, using:
| IOzone | done | BULL | Now testing with fsstress and FFSB |
III.A.2 | Run automounter use case for 2 weeks on amd, autofs, and autong, using:
| e.g. Crashme more | New | ||
III.A.3 | Run NFS server for 2 wks with random configuration changes, using:
| OPEN | OSDL | ||
III.A.4 | Run connectathon locking tests against NFS server for 2 weeks, using:
| NEW | |||
III.A.5 | Run fsstress 2 weeks on basic client/server operations, using:
| fsstress | DONE | BULL | 1 week |
III.A.6 | Run FFSB 1 day on basic client/server operations in stress configuration, using:
| ffsb | DONE | BULL | 1 day |
Resource limit testing
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.B.1 | Test stability of client in out of pid situation | ||||
III.B.2 | Test stability of client in out of memory situation | valgrind | New | ||
III.B.3 | Test stability of client in out of disk space on server situation | dd,fsstress | Done | Bull | Simple error message no space left on device |
III.B.4 | Test stability of client in out of inode situation | ||||
III.B.5 | Test stability of client in out of swap space situation | ||||
III.B.6 | Test stability of server in out of pid situation | ||||
III.B.7 | Test stability of server in out of memory situation | valgrind | New | ||
III.B.8 | Test stability of server in out of disk space | dd,fsstress | Done | Bull | Simple error message no space left on device |
III.B.9 | Test stability of server in out of inode situation | ||||
III.B.10 | Test stability of server in out of swap space situation |
Stress load testing
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.C.1 | Run stress tools in a std config on each release | fsx,fsstress,ffsb | In progress | BULL | fsstress and ffsb are ran 1hour |
III.C.2 | Analyze load balancing, failure modes, etc. under different stress loads | New | |||
III.C.3 | Destructive testing by measuring point of failure for various loads | New |
Scalability (robustness)
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.D.1 | Find maximum number of connections to Linux IA-32 server | Fsstress, fsx | New | Bull (partial) | |
III.D.2 | Find maximum number of files for Linux IA-32 exported file system | ffsb | New | ||
III.D.3 | Find maximum file size on Linux IA-32 | New | |||
III.D.4 | Find maximum number of mounted file systems on client | Fsstress, fsx | New | Bull | |
III.D.5 | Test robustness on NUMA when scaling CPU, mem, NIC, or disk count | Fsstress, fsx | New | ||
III.D.6 | Test robustness on SMP when scaling CPU, mem, NIC, or disk count | Fsstress, fsx | New | Bull (partial) | |
III.D.7 | Test correctness of NFS client when backed by a large (>100GB) cachefs | New | |||
III.D.8 | Find maximum number exported file systems on server | New | |||
III.D.9 | Find maximum size of exported file systems on server | New |
Recovery from problems while under light/normal/heavy loads
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.E.1 | Test short & long term local network failure (unplugged cable, ifdown eth0, etc.) | Open | OSDL | ||
III.E.2 | Test short & long duration remote network partition | Open | OSDL | ||
III.E.3 | Test behavior during crash/reboot of server with clients holding various states | Open | OSDL | more | |
III.E.4 | Test multiple clients using, locking, etc. same files | New | |||
III.E.5 | Test behavior of server with failed storage device | New | |||
III.E.6 | Test behavior during crash of client with open delegations and locks | New | |||
III.E.7 | Test recovery from denied permission | New | |||
III.E.8 | Test recovery from JUKEBOX/DELAY | New | |||
III.E.9 | Test recovery from ESTALE | New |
Race conditions
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.F.1 | Test for race conditions and locking bugs on PPC64 | New | (Polyserve?) | Olaf Kirch says PPC64 is good at exposing problems because of its weak CPU cache coherency semantics | |
III.F. | Test for race conditions on new architectures | New | (Polyserve?) | Faster CPU, memory, and buses can expose race conditions |
Automounter robustness
For more info about Automounter, see notes in nfsv4 list archive for 2/16/05
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.G.1 | Test interuptible automounting in the following cases
| New | |||
III.G.2 | Test concurrent access tests for races in automounter
| New | |||
III.G.3 | Test replicated file system selection | New | |||
III.G.4 | Test remounting after expire corner cases:
| New |