Matrix robustness section

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
(Scalability (robustness))
(Recovery from problems while under light/normal/heavy loads)
 
(43 intermediate revisions not shown)
Line 4: Line 4:
==Basic stability assessments==
==Basic stability assessments==
-
 
+
{|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100%
-
<table nosave="" border="1" width="85%">
+
!style="background: #ececec;"|'''ID
-
<tr nosave="" align="center" valign="CENTER">
+
!style="background: #ececec;"|'''test'''
-
<td nosave="" align="center" valign="CENTER"></td>
+
!style="background: #ececec;"|'''tool test'''
-
 
+
!style="background: #ececec;"|'''status'''
-
<td>test</td>
+
!style="background: #ececec;"|'''owner'''
-
<td>tool test</td>
+
!style="background: #ececec;"|'''notes'''
-
<td>status</td>
+
|-
-
<td>Owner</td>
+
|III.A.1
-
<td>notes</td>
+
|Run iozone for 2 weeks on basic client/server operations, using:
-
</tr>
+
-
<tr>
+
-
<td>III.A.1</td>
+
-
 
+
-
<td>
+
-
Run iozone for 2 weeks on basic client/server operations, using:
+
*Both data and metadata options
*Both data and metadata options
*Cached and direct I/O
*Cached and direct I/O
-
*Various mount options </td><td>IOzone</td><td>'''done'''</td><td>BULL</td><td>Now testing with fsstress and FFSB</td>
+
*Various mount options
-
</tr>
+
|IOzone
-
<tr>
+
|'''done'''
-
<td>III.A.2</td>
+
|BULL
-
<td>
+
|Now testing with fsstress and FFSB
-
Run automounter use case for 2 weeks on amd, autofs, and autong, using:
+
|-
 +
|III.A.2
 +
|Run automounter use case for 2 weeks on amd, autofs, and autong, using:
*Large number of maps
*Large number of maps
*Randomly mount and run workloads on an automounted partition
*Randomly mount and run workloads on an automounted partition
-
*use a variety of workloads, such as randomly chosen fs tests
+
*use a variety of workloads, such as randomly chosen fs tests  
-
</td><td>e.g. Crashme [http://people.delphiforums.com/gjc/crashme.html more]</td><td>New</td><td>none</td><td>none</td>
+
|e.g. Crashme [http://people.delphiforums.com/gjc/crashme.html more]
-
</tr>
+
|'''New'''
-
 
+
|
-
<tr>
+
|
-
<td>III.A.3</td>
+
|-
-
<td>
+
|III.A.3
-
Run NFS server for 2 wks with random configuration changes, using:
+
|Run NFS server for 2 wks with random configuration changes, using:
-
*Interrupt server in various ways (reboot, power cycle, lan fail)
+
*Interrupt server in various ways (reboot, power cycle, lan fail)  
-
*Change/reexport export rules at random
+
*Change/reexport export rules at random  
-
*Trigger a client workload at arbitrary times
+
*Trigger a client workload at arbitrary times  
-
*Analyze client recovery behaviors
+
*Analyze client recovery behaviors
-
</td><td></td><td>'''OPEN'''</td><td>OSDL</td><td></td>
+
|
-
</tr>
+
|'''OPEN'''
-
 
+
|OSDL
-
<tr>
+
|
-
<td>III.A.4</td>
+
|-
-
<td>
+
|III.A.4
-
Run connectathon locking tests against NFS server for 2 weeks, using:
+
|Run connectathon locking tests against NFS server for 2 weeks, using:
-
*Multiple client machines
+
*Multiple client machines  
*Reboot at random
*Reboot at random
-
*Analyze client cache coherency behaviors
+
*Analyze client cache coherency behaviors  
*Analyze locking behaviors
*Analyze locking behaviors
-
</td><td></td><td>'''NEW'''</td><td></td><td></td>
+
|
-
</tr>
+
|'''NEW'''
-
 
+
|
-
<tr>
+
|
-
<td>III.A.5</td>
+
|-
-
<td>
+
|III.A.5
-
Run fsstress 2 weeks on basic client/server operations, using:  
+
|Run fsstress 2 weeks on basic client/server operations, using:  
*Long list random operations (1000 operations)
*Long list random operations (1000 operations)
*hight number of process (100)
*hight number of process (100)
-
</td><td>fsstress</td><td>'''[[Robustness_testing#Main_results|Done]]'''</td><td>BULL</td><td>1 week</td>
+
|fsstress
-
</tr>
+
|'''[[Robustness_testing#Main_results|DONE]]'''
-
 
+
|BULL
-
<tr>
+
|1 week
-
<td>III.A.6</td>
+
|-
-
<td>
+
|III.A.6
-
Run FFSB 1 day on basic client/server operations in stress configuration, using:
+
|Run FFSB 1 day on basic client/server operations in stress configuration, using:
*1 200 000 files
*1 200 000 files
*100 directories
*100 directories
-
</td><td>ffsb</td><td>'''[[Robustness_testing#Main_results|Done]]'''</td><td>BULL</td><td>1 day</td>
+
|ffsb
-
</tr>
+
|'''[[Robustness_testing#Main_results|DONE]]'''
-
</table>
+
|BULL
 +
|1 day
 +
|}
-
== Resource limit testing ==
 
-
<table nosave="" border="1" width="85%">
 
-
<tr nosave="" align="center" valign="CENTER">
 
-
<td nosave="" align="center" valign="CENTER"></td>
 
-
<td>test</td>
+
== Resource limit testing ==
-
<td>tool test</td>
+
{|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100%
-
<td>status</td>
+
!style="background: #ececec;"|'''ID
-
<td>Owner</td>
+
!style="background: #ececec;"|'''test'''
-
<td>notes</td>
+
!style="background: #ececec;"|'''tool test'''
-
</tr>
+
!style="background: #ececec;"|'''status'''
-
<tr>
+
!style="background: #ececec;"|'''owner'''
-
<td>III.B.1</td>
+
!style="background: #ececec;"|'''notes'''
-
<td>
+
|-
-
Test stability of client in out of pid situation</td><td></td><td></td><td></td><td></td>
+
|III.B.1
-
</tr>
+
|Test stability of client in out of pid situation
-
<tr>
+
|
-
<td>III.B.2</td>
+
|
-
<td>
+
|
-
Test stability of client in out of memory situation</td><td>valgrind</td><td>'''new'''</td><td></td><td>IA32</td>
+
|
-
</tr>
+
|-
-
<tr>
+
|III.B.2
-
<td>III.B.3</td>
+
|Test stability of client in out of memory situation
-
<td>
+
|valgrind
-
Test stability of client in out of disk space on server situation</td><td>dd,fsstress</td><td>'''done'''</td><td>BULL</td><td>Simple error message ''no space left on device''</td>
+
|'''New'''
-
<tr>
+
|
-
<td>III.B.4</td>
+
|
-
<td>
+
|-
-
Test stability of client in out of inode situation</td><td></td><td></td><td></td><td></td>
+
|III.B.3
-
</tr>
+
|Test stability of client in out of disk space on server situation
-
<tr>
+
|dd,fsstress
-
<td>III.B.5</td>
+
|'''Done'''
-
<td>
+
|Bull
-
Test stability of client in out of swap space situation</td><td></td><td></td><td></td><td></td>
+
|Simple error message ''no space left on device''
-
</tr>
+
|-
-
<tr>
+
|III.B.4
-
<td>III.B.6</td>
+
|Test stability of client in out of inode situation
-
<td>
+
|
-
Test stability of server in out of pid situation</td><td></td><td></td><td></td><td></td>
+
|
-
</tr>
+
|
-
<tr>
+
|
-
<td>III.B.7</td>
+
|-
-
<td>
+
|III.B.5
-
Test stability of server in out of memory situation</td><td>valgrind</td><td>'''new'''</td><td></td><td>IA32</td>
+
|Test stability of client in out of swap space situation
-
</tr>
+
|
-
<tr>
+
|
-
<td>III.B.8</td>
+
|
-
<td>
+
|
-
Test stability of server in out of disk space situation</td><td>dd,fsstress</td><td>'''done'''</td><td>BULL</td><td> Simple error message ''no space left on device''</td>
+
|-
-
<tr>
+
|III.B.6
-
<td>III.B.9</td>
+
|Test stability of server in out of pid situation
-
<td>
+
|
-
Test stability of server in out of inode situation</td><td></td><td></td><td></td><td></td>
+
|
-
</tr>
+
|
-
<tr>
+
|
-
<td>III.B.10</td>
+
|-
-
<td>
+
|III.B.7
-
Test stability of server in out of swap space situation</td><td></td><td></td><td></td><td></td>
+
|Test stability of server in out of memory situation
-
</tr>
+
|valgrind
-
</table>
+
|'''New'''
 +
|
 +
|
 +
|-
 +
|III.B.8
 +
|Test stability of server in out of disk space
 +
|dd,fsstress
 +
|'''Done'''
 +
|Bull
 +
|Simple error message ''no space left on device''
 +
|-
 +
|III.B.9
 +
|Test stability of server in out of inode situation
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|III.B.10
 +
|Test stability of server in out of swap space situation
 +
|
 +
|
 +
|
 +
|
 +
|}
==Stress load testing==
==Stress load testing==
-
<table nosave="" border="1" width="85%">
+
{|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100%
-
<tr nosave="" align="center" valign="CENTER">
+
!style="background: #ececec;"|'''ID
-
<td nosave="" align="center" valign="CENTER"></td>
+
!style="background: #ececec;"|'''test'''
 +
!style="background: #ececec;"|'''tool test'''
 +
!style="background: #ececec;"|'''status'''
 +
!style="background: #ececec;"|'''owner'''
 +
!style="background: #ececec;"|'''notes'''
 +
|-
 +
|III.C.1
 +
|Run stress tools in a std config on each release
 +
|fsx,fsstress,ffsb
 +
|'''In progress'''
 +
|BULL
 +
|fsstress and ffsb are ran 1hour
 +
|-
 +
|III.C.2
 +
|Analyze load balancing, failure modes, etc. under different stress loads
 +
|
 +
|'''New'''
 +
|
 +
|
 +
|-
 +
|III.C.3
 +
|Destructive testing by measuring point of failure for various loads
 +
|
 +
|'''New'''
 +
|
 +
|
 +
|}
-
<td>test</td>
+
==Scalability (robustness)==
-
<td>tool test</td>
+
{|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100%
-
<td>status</td>
+
!style="background: #ececec;"|'''ID
-
<td>Owner</td>
+
!style="background: #ececec;"|'''test'''
-
<td>notes</td>
+
!style="background: #ececec;"|'''tool test'''
-
</tr>
+
!style="background: #ececec;"|'''status'''
-
<tr>
+
!style="background: #ececec;"|'''owner'''
-
<td>III.C.1</td>
+
!style="background: #ececec;"|'''notes'''
-
<td>Run stress tools in a std config on each release
+
|-
-
</td><td>fsx,fsstress,ffsb</td><td>'''In progress'''</td><td>BULL</td>
+
|III.D.1
-
<td>Tests used:
+
Part 1 : without load
-
*fsx
+
|Find maximum number of connections to Linux IA-32 server
-
*fsstress (1h)
+
|Fsstress, fsx
-
*ffsb (1h)
+
|'''In progress'''
-
</td>
+
|Bull (partial)
-
</tr>
+
|Maximum number of connections to a Linux IA-32 server is '''31998'''. ''Warning'' this connections are done by a simple mount on the client. The exported file system is mounted on diferent directories until an error is reported. In this test client and server are on the same machine and the error reported is ''[Errno 31] Too many links'' - See III.D.4 test and [http://nfsv4.bullopensource.org/tools/tests/page25.php test results]
 +
|-
 +
|III.D.1
 +
Part 2 : with load
 +
|Find maximum number of connections to Linux IA-32 server
 +
|IOZone
 +
|'''In progress'''
 +
|Bull (partial)
 +
|A single bi IA32 processor is able to answer 2048 client with hight load. 2048 simultaneous instances of IOzone's standard tests were launched. They end sucessfully after 6 days. (Note: with only one IOzone instance time to perform the test is about 45minutes, depending on the test conditions). '''[http://nfsv4.bullopensource.org/tools/tests/page26.php Read more]'''
 +
|-
 +
|III.D.2
 +
|Find maximum number of files for Linux IA-32 exported file system
 +
|[http://nfsv4.bullopensource.org/tools/tests_tools/test_files.py Addhoc tool]
 +
|'''Done'''
 +
|Bull
 +
|A first [http://nfsv4.bullopensource.org/tools/tests/page18.php test] give a limit between 15,699 and 15,799. Some patches after that a [http://nfsv4.bullopensource.org/tools/tests/page19.php  new limit] is over 250,000 files. This limit is the maximal number of files NFS is able to list in a shared directory. The time creation was  [http://nfsv4.bullopensource.org/tools/tests/page20.php measured] and there is a change of behaviour when number of files is over 1,620,000 in a directory. This limitation is now fixed (linux 2.6.13 +)
 +
|-
 +
|III.D.3
 +
|Find maximum file size on Linux IA-32
 +
|[http://nfsv4.bullopensource.org/tools/tests_tools/bigfile.tar.gz adhoc tool]
 +
|'''Done'''
 +
|Bull
 +
|size is the maximal size the server's local filesystem is able to manage. 8TB for NFSv4/XFS/ia32.
 +
|-
 +
|III.D.4
 +
|Find maximum number of mounted file systems on client
 +
|[http://nfsv4.bullopensource.org/tools/tests_tools/test_mount.py adhoc tool]
 +
|'''done'''
 +
|Bull
 +
|Maximum number of connections to Linux IA-32 server is '''31998'''. ''Warning'' this connections are done by a simple mount on the client. The exported file system is mounted on diferent directories until an error is reported. In this test client and server are on the same machine and the error reported is ''[Errno 31] Too many links'' see [http://nfsv4.bullopensource.org/tools/tests/page25.php test results]
 +
|-
 +
|III.D.5
 +
|Test robustness on NUMA when scaling CPU, mem, NIC, or disk count
 +
|Fsstress, fsx
 +
|'''New'''
 +
|
 +
|
 +
|-
 +
|III.D.6
 +
|Test robustness on SMP when scaling CPU, mem, NIC, or disk count
 +
|Fsstress, fsx
 +
|'''New'''
 +
|Bull (partial)
 +
|
 +
|-
 +
|III.D.7
 +
|Test correctness of NFS client when backed by a large (>100GB) cachefs
 +
|
 +
|'''New'''
 +
|
 +
|
 +
|-
 +
|III.D.8
 +
|Find maximum number exported file systems on server
 +
|[http://nfsv4.bullopensource.org/tools/tests_tools/test_mount.py adhoc tool]
 +
|'''Near done'''
 +
|Bull
 +
|No limit reached, but the export process is too slow to be acceptable after 3000 exports. 13000 exports reached. See [http://nfsv4.bullopensource.org/tools/tests/page17.php this] page for more information.
 +
|-
 +
|III.D.9
 +
|Find maximum size of exported file systems on server
 +
|
 +
|'''Done'''
 +
|
 +
|Maximum size of exported file system on server do not depend on NFS/NFS tools. NFSv4 tools do not use filesystem  size informations. Limit is the file system limit it self.
 +
|-
 +
|III.D.10
 +
|Access a single locked file with a large amount of clients
 +
|locktester
 +
|'''Done'''
 +
|'''Bull'''
 +
|A lot of bug fixes. Working [http://nfsv4.bullopensource.org/tools/tests/page41.php now].
 +
|-
 +
|III.D.11
 +
|Divide a file into a hight number of locked sections
 +
|locktester
 +
|'''Done'''
 +
|'''Bull'''
 +
|A lot of bug fixes. Working [http://nfsv4.bullopensource.org/tools/tests/page41.php now].
 +
|-
 +
|}
-
<tr>
+
==Recovery from problems while under light/normal/heavy loads==
-
<td>III.C.2</td>
+
-
<td>Analyze load balancing, failure modes, etc. under different stress loads
+
-
</td><td></td><td>'''New'''</td><td></td><td></td>
+
-
</tr>
+
-
<tr>
+
{|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100%
-
<td>III.C.3</td>
+
!style="background: #ececec;"|'''ID
-
<td>Destructive testing by measuring point of failure for various loads
+
!style="background: #ececec;"|'''test'''
-
</td><td></td><td>'''New'''</td><td></td><td></td>
+
!style="background: #ececec;"|'''tool test'''
-
</tr>
+
!style="background: #ececec;"|'''status'''
 +
!style="background: #ececec;"|'''owner'''
 +
!style="background: #ececec;"|'''notes'''
 +
|-
 +
|III.E.1
 +
|Test short & long term local network failure (unplugged cable, ifdown eth0, etc.)
 +
|
 +
|'''Open'''
 +
|OSDL
 +
|
 +
|-
 +
|III.E.2
 +
|Test short & long duration remote network partition
 +
|
 +
|'''Open'''
 +
|OSDL
 +
|
 +
|-
 +
|III.E.3
 +
|Test behavior during crash/reboot of server with clients holding various states
 +
|
 +
|'''Open'''
 +
|OSDL
 +
|[ftp://ftp.cis.uoguelph.ca/pub/nfsv4/testing-stuff more]
 +
|-
 +
|III.E.4
 +
|Test multiple clients using, locking, etc. same files
 +
|
 +
|'''Done'''
 +
|Bull
 +
|[http://nfsv4.bullopensource.org/tools/tests/page41.php More]
 +
|-
 +
|III.E.5
 +
|Test behavior of server with failed storage device
 +
|
 +
|'''New'''
 +
|
 +
|
 +
|-
 +
|III.E.6
 +
|Test behavior during crash of client with open delegations and locks
 +
|
 +
|'''New'''
 +
|
 +
|
 +
|-
 +
|III.E.7
 +
|Test recovery from denied permission
 +
|
 +
|'''New'''
 +
|
 +
|
 +
|-
 +
|III.E.8
 +
|Test recovery from JUKEBOX/DELAY
 +
|
 +
|'''New'''
 +
|
 +
|
 +
|-
 +
|III.E.9
 +
|Test recovery from ESTALE
 +
|
 +
|'''New'''
 +
|
 +
|
 +
|}
-
</table>
+
==Race conditions==
-
==Scalability (robustness)==
+
{|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100%
 +
!style="background: #ececec;"|'''ID
 +
!style="background: #ececec;"|'''test'''
 +
!style="background: #ececec;"|'''tool test'''
 +
!style="background: #ececec;"|'''status'''
 +
!style="background: #ececec;"|'''owner'''
 +
!style="background: #ececec;"|'''notes'''
 +
|-
 +
|III.F.1
 +
|Test for race conditions and locking bugs on PPC64
 +
|
 +
|'''New'''
 +
|(Polyserve?)
 +
|Olaf Kirch says PPC64 is good at exposing problems because of its weak CPU cache coherency semantics
 +
|-
 +
|III.F.
 +
|Test for race conditions on new architectures
 +
|
 +
|'''New'''
 +
|(Polyserve?)
 +
|Faster CPU, memory, and buses can expose race conditions
 +
|}
 +
==Automounter robustness==
 +
For more info about Automounter, see notes in nfsv4 list archive for 2/16/05
-
<table nosave="" border="1" width="85%">
+
{|border="1" width="100%" cellpadding="1" cellspacing="0" style="font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100%
-
<tr nosave="" align="center" valign="CENTER">
+
!style="background: #ececec;"|'''ID
-
<td nosave="" align="center" valign="CENTER"></td>
+
!style="background: #ececec;"|'''test'''
-
 
+
!style="background: #ececec;"|'''tool test'''
-
<td>test</td>
+
!style="background: #ececec;"|'''status'''
-
<td>tool test</td>
+
!style="background: #ececec;"|'''owner'''
-
<td>status</td>
+
!style="background: #ececec;"|'''notes'''
-
<td>Owner</td>
+
|-
-
<td>notes</td>
+
|III.G.1
-
</tr>
+
|Test interuptible automounting in the following cases
-
<tr>
+
*indirect mount
-
<td>III.D.1</td>
+
*direct mount   
-
<td>
+
*browsed mount   
-
Find maximum number of connections to Linux IA-32 server</td><td>Fsstress, fsx</td><td>'''New'''</td><td>Bull (partial)</td><td></td>
+
*multimount offset
-
</tr>
+
|
-
<tr>
+
|'''New'''
-
<td>III.D.2</td>
+
|
-
<td>
+
|
-
Find maximum number of files for Linux IA-32 exported file system</td><td></td><td>'''New'''</td><td></td><td></td>
+
|-
-
</tr>
+
|III.G.2
-
<tr>
+
|Test concurrent access tests for races in automounter
-
<td>III.D.3</td>
+
*Have multiple threads working in parallel
-
<td>
+
|
-
Find maximum file size on Linux IA-32</td><td></td><td>'''New'''</td><td></td><td></td>
+
|'''New'''
-
</tr>
+
|
-
<tr>
+
|
-
<td>III.D.4</td>
+
|-
-
<td>
+
|III.G.3
-
Find maximum number of mounted file systems on client</td><td>Fsstress, fsx</td><td>'''New'''</td><td>Bull (partial)</td><td></td>
+
|Test replicated file system selection
-
</tr>
+
|
-
<tr>
+
|'''New'''
-
<td>III.D.5</td>
+
|
-
<td>
+
|
-
Test robustness on NUMA when scaling CPU, mem, NIC, or disk count</td><td>Fsstress, fsx</td><td>'''New'''</td><td></td><td></td>
+
|-
-
</tr>
+
|III.G.4
-
<tr>
+
|Test remounting after expire corner cases:
-
<td>III.D.6</td>
+
*Something (a process) sitting in the scaffolding   
-
<td>
+
*Common case for /net
-
Test robustness on SMP when scaling CPU, mem, NIC, or disk count</td><td>Fsstress, fsx</td><td>'''New'''</td><td>Bull (partial)</td><td></td>
+
|
-
</tr>
+
|'''New'''
-
<tr>
+
|
-
<td>III.D.7</td>
+
|Needs to be supported at nfs level
-
<td>
+
|}
-
Test correctness of NFS client when backed by a large (>100GB) cachefs</td><td></td><td>'''New'''</td><td></td><td></td>
+
-
</tr>
+
-
<tr>
+
-
<td>III.D.8</td>
+
-
<td>
+
-
Find maximum number exported file systems on server</td><td></td><td>'''New'''</td><td></td><td></td>
+
-
</tr>
+
-
<tr>
+
-
<td>III.D.9</td>
+
-
<td>
+
-
Find maximum size of exported file systems on server</td><td></td><td>'''New'''</td><td></td><td></td>
+
-
</tr>
+
-
</table>
+
-
 
+
-
==Recovery from problems while under light/normal/heavy loads==
+
-
<table nosave="" border="1" width="85%">
+
-
<tr nosave="" align="center" valign="CENTER">
+
-
<td nosave="" align="center" valign="CENTER"></td>
+
-
 
+
-
<td>test</td>
+
-
<td>tool test</td>
+
-
<td>status</td>
+
-
<td>Owner</td>
+
-
<td>notes</td>
+
-
</tr>
+
-
<tr>
+
-
<td>III.E.1</td>
+
-
<td>Test short & long term local network failure (unplugged cable, ifdown eth0, etc.)</td><td></td><td>'''Open'''</td><td>OSDL</td><td></td>
+
-
</tr>
+
-
<tr>
+
-
<td>III.E.2</td>
+
-
<td>
+
-
Test short & long duration remote network partition</td><td></td><td>'''Open'''</td><td>OSDL</td><td></td>
+
-
</tr>
+
-
<tr>
+
-
<td>III.E.3</td>
+
-
<td>
+
-
Test behavior during crash/reboot of server with clients holding various states</td><td></td><td>'''Open'''</td><td>OSDL</td><td>[ftp://ftp.cis.uoguelph.ca/pub/nfsv4/testing-stuff more]</td>
+
-
</tr>
+
-
<tr>
+
-
<td>III.E.4</td>
+
-
<td>
+
-
Test multiple clients using, locking, etc. same files</td><td></td><td>'''New'''</td><td></td><td></td>
+
-
</tr>
+
-
<tr>
+
-
<td>III.E.5</td>
+
-
<td>
+
-
Test behavior of server with failed storage device</td><td></td><td>'''New'''</td><td></td><td></td>
+
-
</tr>
+
-
<tr>
+
-
<td>III.E.6</td>
+
-
<td>
+
-
Test behavior during crash of client with open delegations and locks</td><td></td><td>'''New'''</td><td></td><td></td>
+
-
</tr>
+
-
<tr>
+
-
<td>III.E.7</td>
+
-
<td>
+
-
Test recovery from denied permission</td><td></td><td>'''New'''</td><td></td><td></td>
+
-
</tr>
+
-
<tr>
+
-
<td>III.E.5</td>
+
-
<td>
+
-
Test recovery from JUKEBOX/DELAY</td><td></td><td>'''New'''</td><td></td><td></td>
+
-
</tr>
+
-
<tr>
+
-
<td>III.E.5</td>
+
-
<td>
+
-
Test recovery from ESTALE</td><td></td><td>'''New'''</td><td></td><td></td>
+
-
</tr>
+
-
</table>
+

Latest revision as of 11:37, 9 November 2005

Section III

Contents

ROBUSTNESS TESTING

Basic stability assessments

ID test tool test status owner notes
III.A.1 Run iozone for 2 weeks on basic client/server operations, using:
  • Both data and metadata options
  • Cached and direct I/O
  • Various mount options
IOzone done BULL Now testing with fsstress and FFSB
III.A.2 Run automounter use case for 2 weeks on amd, autofs, and autong, using:
  • Large number of maps
  • Randomly mount and run workloads on an automounted partition
  • use a variety of workloads, such as randomly chosen fs tests
e.g. Crashme more New
III.A.3 Run NFS server for 2 wks with random configuration changes, using:
  • Interrupt server in various ways (reboot, power cycle, lan fail)
  • Change/reexport export rules at random
  • Trigger a client workload at arbitrary times
  • Analyze client recovery behaviors
OPEN OSDL
III.A.4 Run connectathon locking tests against NFS server for 2 weeks, using:
  • Multiple client machines
  • Reboot at random
  • Analyze client cache coherency behaviors
  • Analyze locking behaviors
NEW
III.A.5 Run fsstress 2 weeks on basic client/server operations, using:
  • Long list random operations (1000 operations)
  • hight number of process (100)
fsstress DONE BULL 1 week
III.A.6 Run FFSB 1 day on basic client/server operations in stress configuration, using:
  • 1 200 000 files
  • 100 directories
ffsb DONE BULL 1 day


Resource limit testing

ID test tool test status owner notes
III.B.1 Test stability of client in out of pid situation
III.B.2 Test stability of client in out of memory situation valgrind New
III.B.3 Test stability of client in out of disk space on server situation dd,fsstress Done Bull Simple error message no space left on device
III.B.4 Test stability of client in out of inode situation
III.B.5 Test stability of client in out of swap space situation
III.B.6 Test stability of server in out of pid situation
III.B.7 Test stability of server in out of memory situation valgrind New
III.B.8 Test stability of server in out of disk space dd,fsstress Done Bull Simple error message no space left on device
III.B.9 Test stability of server in out of inode situation
III.B.10 Test stability of server in out of swap space situation

Stress load testing

ID test tool test status owner notes
III.C.1 Run stress tools in a std config on each release fsx,fsstress,ffsb In progress BULL fsstress and ffsb are ran 1hour
III.C.2 Analyze load balancing, failure modes, etc. under different stress loads New
III.C.3 Destructive testing by measuring point of failure for various loads New

Scalability (robustness)

ID test tool test status owner notes
III.D.1
Part 1 : without load
Find maximum number of connections to Linux IA-32 server Fsstress, fsx In progress Bull (partial) Maximum number of connections to a Linux IA-32 server is 31998. Warning this connections are done by a simple mount on the client. The exported file system is mounted on diferent directories until an error is reported. In this test client and server are on the same machine and the error reported is [Errno 31] Too many links - See III.D.4 test and test results
III.D.1
Part 2 : with load
Find maximum number of connections to Linux IA-32 server IOZone In progress Bull (partial) A single bi IA32 processor is able to answer 2048 client with hight load. 2048 simultaneous instances of IOzone's standard tests were launched. They end sucessfully after 6 days. (Note: with only one IOzone instance time to perform the test is about 45minutes, depending on the test conditions). Read more
III.D.2 Find maximum number of files for Linux IA-32 exported file system Addhoc tool Done Bull A first test give a limit between 15,699 and 15,799. Some patches after that a new limit is over 250,000 files. This limit is the maximal number of files NFS is able to list in a shared directory. The time creation was measured and there is a change of behaviour when number of files is over 1,620,000 in a directory. This limitation is now fixed (linux 2.6.13 +)
III.D.3 Find maximum file size on Linux IA-32 adhoc tool Done Bull size is the maximal size the server's local filesystem is able to manage. 8TB for NFSv4/XFS/ia32.
III.D.4 Find maximum number of mounted file systems on client adhoc tool done Bull Maximum number of connections to Linux IA-32 server is 31998. Warning this connections are done by a simple mount on the client. The exported file system is mounted on diferent directories until an error is reported. In this test client and server are on the same machine and the error reported is [Errno 31] Too many links see test results
III.D.5 Test robustness on NUMA when scaling CPU, mem, NIC, or disk count Fsstress, fsx New
III.D.6 Test robustness on SMP when scaling CPU, mem, NIC, or disk count Fsstress, fsx New Bull (partial)
III.D.7 Test correctness of NFS client when backed by a large (>100GB) cachefs New
III.D.8 Find maximum number exported file systems on server adhoc tool Near done Bull No limit reached, but the export process is too slow to be acceptable after 3000 exports. 13000 exports reached. See this page for more information.
III.D.9 Find maximum size of exported file systems on server Done Maximum size of exported file system on server do not depend on NFS/NFS tools. NFSv4 tools do not use filesystem size informations. Limit is the file system limit it self.
III.D.10 Access a single locked file with a large amount of clients locktester Done Bull A lot of bug fixes. Working now.
III.D.11 Divide a file into a hight number of locked sections locktester Done Bull A lot of bug fixes. Working now.

Recovery from problems while under light/normal/heavy loads

ID test tool test status owner notes
III.E.1 Test short & long term local network failure (unplugged cable, ifdown eth0, etc.) Open OSDL
III.E.2 Test short & long duration remote network partition Open OSDL
III.E.3 Test behavior during crash/reboot of server with clients holding various states Open OSDL more
III.E.4 Test multiple clients using, locking, etc. same files Done Bull More
III.E.5 Test behavior of server with failed storage device New
III.E.6 Test behavior during crash of client with open delegations and locks New
III.E.7 Test recovery from denied permission New
III.E.8 Test recovery from JUKEBOX/DELAY New
III.E.9 Test recovery from ESTALE New

Race conditions

ID test tool test status owner notes
III.F.1 Test for race conditions and locking bugs on PPC64 New (Polyserve?) Olaf Kirch says PPC64 is good at exposing problems because of its weak CPU cache coherency semantics
III.F. Test for race conditions on new architectures New (Polyserve?) Faster CPU, memory, and buses can expose race conditions

Automounter robustness

For more info about Automounter, see notes in nfsv4 list archive for 2/16/05

ID test tool test status owner notes
III.G.1 Test interuptible automounting in the following cases
  • indirect mount
  • direct mount
  • browsed mount
  • multimount offset
New
III.G.2 Test concurrent access tests for races in automounter
  • Have multiple threads working in parallel
New
III.G.3 Test replicated file system selection New
III.G.4 Test remounting after expire corner cases:
  • Something (a process) sitting in the scaffolding
  • Common case for /net
New Needs to be supported at nfs level
Personal tools