Cthon06 Meeting Notes
From Linux NFS
ddr2 512 mb x 2 400 mhz angeli della notte casa editrice mondadori giochi hot raffaele viviani pompino amatoriali la gabbia degli usignoli bmw serie 5 e60 madeiras brasil gior sony nwe105 mtb bike scarrafone suavemente microsoft sql server helios tu scendi nina sky suoneria fibra lc hp inkjet print 78 lettore di schede per tv corriere canadese com ww exporto brasil com pe fantomas contro fantomas manuel de peppe batteria videocamera china textil richard mason noi tv loewe gay it xbox soluzioni loto alchimie robbie williams misunderstood fumetti dragonballx scheda madre asus a8v deluxe io io io e gli altri foto eva engel nuda vivavoce sbs locali gay sorrento scheda madre per amd sempron nike shox 245 moment of happiness decapitazione inglese pensare la morte elsalila occhiali uomo dior convertitore 3gp mpg panfili satoshi tomiie profumi guy laroche download mp3 eamon solo cheloni sandali con tacco donna bollino blu cartamodelli premaman zibellino obiettivi 28300 palio di siena agosto 2004 rayman ds lexmark x 5130 barriera bambini galeazzi nicola savino laplink mover buste paga nokia telefonini 6681 iron maiden the number of the beast classic albums denim evisu genesis dvd www gigante net get party on gioco con parole il re nudo kettler condor benq fp2091 testo vivo per lei oro prezzo voli aerei lettore portatile mp3 ccdxp32 v 3 6 2 8 gianmaria informazioni on line trasforma 3gp in avi o mpg cen cen calendario maxim 2005 jennifer lopez wallpapers vivaro opel s44 cordless pc pentium 4 3 4 ghz guy savoy gasper elettrodomestico mutuo costruzione la mandrakata better off alone toner ibm vendita dvd r vergini thermaltake silent cat look bohemien juegos trasandinos 2004 sena edu co stampante drive canon geogle lontana dagli occhi jean paul gaultier 75 smart della mercedes mongolia izabal la furia del drago noleggio smart padova crisi economica del brasile garmischpartenkirchen antonio pellegrino scheda madre socket 478 donne incinte gratis flauto traverso yamaha hd top la politica mondiale accordi canzone senza parole lo sceriffo senza stella patos de minas giulio cesare irak decapitation caricabatterie per pile aa lalba della guerra sulla rete modem router mandarina duck lavatrice profondita 33 cm t flash cand noaptea vine profumi dior bronze playstation portable ripetutamente classrom 2 usb on the go il bagatta ati radeon 9600pro ez motorola hs 801 le strade della paura lourdes albergo edizione 22 giugno 2004 guyotat pierre falchi nuda pearl blue soul mmc nokia 6630 1gb www gustavo lins com br papito www siemens gennaro english translation nokia hs8 got to learn sometime imprese di pulizia modena fun 6in1 seca1 hard disk iomega 250 gb dvd vergini economici terrore su quattro ruote idipendente enya eau de rochas 100ml booty jeanne modigliani lux gladiator colonna sonora pellegrino nuda liky www orion ro ericson z200 sharp z201 monitor multimediale lcd 19 mb socket 754 satomi ton deep purple come hell or high water personal computer acer mario winas know www radiomontecarlo com hieracium coro pompieri cuffie sennheiser px 30 tratte aeree giochi erotico on line gratis la coccinella magazzini zara yamaha fazer 1000 microsoft wireless desktop elite trentini ultimo dell anno onkyo 501 giubbotti donna diesel sirens sirene navigatore satellitare accessori offerta telefonino coppia foggia due occhi per non vedere frigorifero general electric a due porte www motori di ricerca dediche piu belle grizzly tank manuale ripresa video god of war raccolto in fotografia indesit frigo logitech web cam musica cocciante Linux pNFS Implementation meeting at Connectathon 2006
Note that these are raw, unprocessed notes!
pNFS client: walk through mount, open, and I/O WRT pNFS operations. LD = layout driver
0) register LD 1) MOUNT
a) FS_LAYOT_TYPES - currently one one b) Match returned types to LD c) register SB for pnfs private area use d) LD -> client GETDEVICELIST
ISSUE: why do GETDEVICELIST at mount? current client doesn't have a clientid until first open - device list might change
- pVFS2 doesn't use GETDEVICELIST - file layout could wait uthil open - block (object) will call immediatley to determine connectivity
2) LAYOUTGET
a) layoutget in OPEN compound? opportunity for LD to say whether or not to do LAYOUTGET
ISSUE:
- might already have a layout - blocks case OPEN references hardlink? one open for write, one open for read
b) currently waits until first IOo LD ->LAYOUTGET
QUESTION: file size - do get a layout or not?
trond - client. when vm asks for io -per page or larger extent basis. nfs client tracks this because in one case can coalase io or not. obvious to nfs whetehr or not to call into pNFS
marc - wants server policy e.g. ask server garth - what if you already have a layout?
no way for server to communicate.
brent - ask client layout driver
trond - nfs_flush_list() list of request, wsize requests on wire. in pnfs case call into LD to see if it wants to deal with it.
brent: who set up parameters on layout get. dean: client asks for io size, server decides layout size
trond: 5 byte file, why should client ask for a layout benny: load distribution (10,000 5 byte files going through MDS trond/bruce: MDS needs to do state reseources for each layout brent/marc; cache small files in data server, need to redirect
ask mds prior to getting layout. or garth: IO threshhold for pNFS? per file attr for getattr?
layout driver should have some control over the range asked?
c) where is it stored? layout is stored off inode.
ISSUE: private pointer off inode, private area, hang as much stuff that you want to - all private to LD
trond: may want to reuse vfs locking code - new type of file_lock don't want to add lock management code into the nfs client.
- properties of posix locks - coalase locks
3) READ two ways: standard page cache normal nfs clients pVFS2 way.
regular: read ahead code in VM to determine actual size. calls into nfs code to read. needs to id the layout (get the whole inode, layout pointer). gives page list. regular read
brent: any hint from file system? trond: primiv - largest you an accept, some hooks
dean: different read ahead size for data servers
garth: read aheads span stripes by just a small amount trond: modification to vm read-ahead code, currently reads in PAGE_SIZE brent: prefered max matches up to the strip- stripe aligned? trond; always page aligned - chunck size aligned
read-ahead: application decides how much to read ahead
brent: important ask DS 1K ask 64 DS for 1K - same amount of time
trond end up filling up much more of the page cache and confusing VM LRU ...
random access vrs
brent: read ahead needs chunck alignment within page alignment notion
dean: currently have an interface to ask for this now (in terms of PAGES) stipe size multiple of PAGE SIZE
trond: O_DIRECT needs some consideration.
dean: can't get more than a half meg out of the VM
d) last void * pointer in the read/write interface. lots of nfs code and then needs to call LD, so used to need type checking: declare struct blah; in pvfs2 trond: setting it up as a cookie is done when you have two or more that need it. not when you have one.
ISSUE:
4) WRITE a) uses standard colasce stripe size nfs code to construct wsize chuncks different for O_DIRECT - map user mem into vm space not file system specific. dycotomy - like to convert the O_DIRECT A_IO support only for O_DIRECT. normal code writes go into the vm, and returns. generic AIO code hacky -waiting for locks, etc
b) sent to write function. write page list c) marks pagees for commit close, fsync, mempresure d) instead of calling regutlar commit, callsd LD commit. LDcommit - to data servers.
ISSUE: if not using nfs code for commits (e.g. your own when can you clear pages? LD needs to clear it's own pages when commit operation
5) LAYOUTCOMMIT:
called on fsync, stat calls, close, lock, locku, on trunc setattr? triggered by user land call.
ISSUE: garth: prefered that it is done prior to any getattr, but getattr is always called.
benny: only stat system call getattr.
don't overwrite the file size on client on getattr if havent sent layout commit.
added flag to inode structure: writes have happened, no layout commit.
set flag when writes are issues. used to trigger a LAYOUTCOMMIT on fsync, etc
and then cleared.
ISSUE: which creds to use? add a pointer to the write creds, bump counter. use nfsopen context
ISSUE: who constructs LAYOUTCOMMIT - currently generic pnfs needs to a LD call because different byte ranges etc.
LD->LAYOUTCOMMIT send an array of commits. block layout uses layout update structure, object
Small files
currently three round trips OPEN READ CLOSE whole file current stateid = down to one 154,1-8