Matrix robustness section
From Linux NFS
(Difference between revisions)
(→Scalability (robustness)) |
(Setting table widths to 100%) |
||
Line 4: | Line 4: | ||
==Basic stability assessments== | ==Basic stability assessments== | ||
- | {|border="1" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% | + | {|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% |
!style="background: #ececec;"|'''ID | !style="background: #ececec;"|'''ID | ||
!style="background: #ececec;"|'''test''' | !style="background: #ececec;"|'''test''' | ||
Line 26: | Line 26: | ||
*Large number of maps | *Large number of maps | ||
*Randomly mount and run workloads on an automounted partition | *Randomly mount and run workloads on an automounted partition | ||
- | *use a variety of workloads, such as randomly chosen fs tests | + | *use a variety of workloads, such as randomly chosen fs tests |
|e.g. Crashme [http://people.delphiforums.com/gjc/crashme.html more] | |e.g. Crashme [http://people.delphiforums.com/gjc/crashme.html more] | ||
|'''New''' | |'''New''' | ||
Line 34: | Line 34: | ||
|III.A.3 | |III.A.3 | ||
|Run NFS server for 2 wks with random configuration changes, using: | |Run NFS server for 2 wks with random configuration changes, using: | ||
- | *Interrupt server in various ways (reboot, power cycle, lan fail) | + | *Interrupt server in various ways (reboot, power cycle, lan fail) |
- | *Change/reexport export rules at random | + | *Change/reexport export rules at random |
- | *Trigger a client workload at arbitrary times | + | *Trigger a client workload at arbitrary times |
*Analyze client recovery behaviors | *Analyze client recovery behaviors | ||
| | | | ||
Line 45: | Line 45: | ||
|III.A.4 | |III.A.4 | ||
|Run connectathon locking tests against NFS server for 2 weeks, using: | |Run connectathon locking tests against NFS server for 2 weeks, using: | ||
- | *Multiple client machines | + | *Multiple client machines |
*Reboot at random | *Reboot at random | ||
- | *Analyze client cache coherency behaviors | + | *Analyze client cache coherency behaviors |
*Analyze locking behaviors | *Analyze locking behaviors | ||
| | | | ||
Line 74: | Line 74: | ||
== Resource limit testing == | == Resource limit testing == | ||
- | {|border="1" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% | + | {|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% |
!style="background: #ececec;"|'''ID | !style="background: #ececec;"|'''ID | ||
!style="background: #ececec;"|'''test''' | !style="background: #ececec;"|'''test''' | ||
Line 155: | Line 155: | ||
==Stress load testing== | ==Stress load testing== | ||
- | {|border="1" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% | + | {|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% |
!style="background: #ececec;"|'''ID | !style="background: #ececec;"|'''ID | ||
!style="background: #ececec;"|'''test''' | !style="background: #ececec;"|'''test''' | ||
Line 261: | Line 261: | ||
- | {|border="1" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% | + | {|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% |
!style="background: #ececec;"|'''ID | !style="background: #ececec;"|'''ID | ||
!style="background: #ececec;"|'''test''' | !style="background: #ececec;"|'''test''' | ||
Line 305: | Line 305: | ||
|- | |- | ||
|III.E.6 | |III.E.6 | ||
- | |Test behavior during crash of client with open delegations and locks | + | |Test behavior during crash of client with open delegations and locks |
| | | | ||
|'''New''' | |'''New''' | ||
Line 335: | Line 335: | ||
==Race conditions== | ==Race conditions== | ||
- | {|border="1" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% | + | {|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% |
!style="background: #ececec;"|'''ID | !style="background: #ececec;"|'''ID | ||
!style="background: #ececec;"|'''test''' | !style="background: #ececec;"|'''test''' | ||
Line 362: | Line 362: | ||
For more info about Automounter, see notes in nfsv4 list archive for 2/16/05 | For more info about Automounter, see notes in nfsv4 list archive for 2/16/05 | ||
- | {|border="1" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% | + | {|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100% |
!style="background: #ececec;"|'''ID | !style="background: #ececec;"|'''ID | ||
!style="background: #ececec;"|'''test''' | !style="background: #ececec;"|'''test''' | ||
Line 373: | Line 373: | ||
|Test interuptible automounting in the following cases | |Test interuptible automounting in the following cases | ||
*indirect mount | *indirect mount | ||
- | *direct mount | + | *direct mount |
- | *browsed mount | + | *browsed mount |
*multimount offset | *multimount offset | ||
| | | | ||
Line 397: | Line 397: | ||
|- | |- | ||
|III.G.4 | |III.G.4 | ||
- | |Test remounting after expire corner cases: | + | |Test remounting after expire corner cases: |
- | *Something (a process) sitting in the scaffolding | + | *Something (a process) sitting in the scaffolding |
*Common case for /net | *Common case for /net | ||
| | | |
Revision as of 22:37, 19 May 2005
Section III
Contents |
ROBUSTNESS TESTING
Basic stability assessments
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.A.1 | Run iozone for 2 weeks on basic client/server operations, using:
| IOzone | done | BULL | Now testing with fsstress and FFSB |
III.A.2 | Run automounter use case for 2 weeks on amd, autofs, and autong, using:
| e.g. Crashme more | New | ||
III.A.3 | Run NFS server for 2 wks with random configuration changes, using:
| OPEN | OSDL | ||
III.A.4 | Run connectathon locking tests against NFS server for 2 weeks, using:
| NEW | |||
III.A.5 | Run fsstress 2 weeks on basic client/server operations, using:
| fsstress | DONE | BULL | 1 week |
III.A.6 | Run FFSB 1 day on basic client/server operations in stress configuration, using:
| ffsb | DONE | BULL | 1 day |
Resource limit testing
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.B.1 | Test stability of client in out of pid situation | ||||
III.B.2 | Test stability of client in out of memory situation | valgrind | New | ||
III.B.3 | Test stability of client in out of disk space on server situation | dd,fsstress | Done | Bull | Simple error message no space left on device |
III.B.4 | Test stability of client in out of inode situation | ||||
III.B.5 | Test stability of client in out of swap space situation | ||||
III.B.6 | Test stability of server in out of pid situation | ||||
III.B.7 | Test stability of server in out of memory situation | valgrind | New | ||
III.B.8 | Test stability of server in out of disk space | dd,fsstress | Done | Bull | Simple error message no space left on device |
III.B.9 | Test stability of server in out of inode situation | ||||
III.B.10 | Test stability of server in out of swap space situation |
Stress load testing
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.C.1 | Run stress tools in a std config on each release | fsx,fsstress,ffsb | In progress | BULL | fsstress and ffsb are ran 1hour |
III.C.2 | Analyze load balancing, failure modes, etc. under different stress loads | New | |||
III.C.3 | Destructive testing by measuring point of failure for various loads | New |
Scalability (robustness)
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.D.1 | Find maximum number of connections to Linux IA-32 server | Fsstress, fsx | New | Bull (partial) | |
III.D.2 | Find maximum number of files for Linux IA-32 exported file system | ffsb | New | ||
III.D.3 | Find maximum file size on Linux IA-32 | New | |||
III.D.4 | Find maximum number of mounted file systems on client | Fsstress, fsx | New | Bull | |
III.D.5 | Test robustness on NUMA when scaling CPU, mem, NIC, or disk count | Fsstress, fsx | New | ||
III.D.6 | Test robustness on SMP when scaling CPU, mem, NIC, or disk count | Fsstress, fsx | New | Bull (partial) | |
III.D.7 | Test correctness of NFS client when backed by a large (>100GB) cachefs | New | |||
III.D.8 | Find maximum number exported file systems on server | New | |||
III.D.9 | Find maximum size of exported file systems on server | New |
Recovery from problems while under light/normal/heavy loads
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.E.1 | Test short & long term local network failure (unplugged cable, ifdown eth0, etc.) | Open | OSDL | ||
III.E.2 | Test short & long duration remote network partition | Open | OSDL | ||
III.E.3 | Test behavior during crash/reboot of server with clients holding various states | Open | OSDL | more | |
III.E.4 | Test multiple clients using, locking, etc. same files | New | |||
III.E.5 | Test behavior of server with failed storage device | New | |||
III.E.6 | Test behavior during crash of client with open delegations and locks | New | |||
III.E.7 | Test recovery from denied permission | New | |||
III.E.8 | Test recovery from JUKEBOX/DELAY | New | |||
III.E.9 | Test recovery from ESTALE | New |
Race conditions
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.F.1 | Test for race conditions and locking bugs on PPC64 | New | (Polyserve?) | Olaf Kirch says PPC64 is good at exposing problems because of its weak CPU cache coherency semantics | |
III.F. | Test for race conditions on new architectures | New | (Polyserve?) | Faster CPU, memory, and buses can expose race conditions |
Automounter robustness
For more info about Automounter, see notes in nfsv4 list archive for 2/16/05
ID | test | tool test | status | owner | notes |
---|---|---|---|---|---|
III.G.1 | Test interuptible automounting in the following cases
| New | |||
III.G.2 | Test concurrent access tests for races in automounter
| New | |||
III.G.3 | Test replicated file system selection | New | |||
III.G.4 | Test remounting after expire corner cases:
| New | Needs to be supported at nfs level |