Introduction to Linux NFS hacking
From Linux NFS
(→Mailing Lists) |
(→Contributing) |
||
(7 intermediate revisions not shown) | |||
Line 7: | Line 7: | ||
= understanding NFSv4 = | = understanding NFSv4 = | ||
- | The authoritative | + | The authoritative sources are RFC 7530 (NFSv4.0), RFC 5881 (NFSv4.1), and related documents from the [https://datatracker.ietf.org/wg/nfsv4/documents/ NFSv4 IETF working group]. Don't read them from start to finish! They're too long. But keep them on hand to refer to when you need to understand something specific. |
The best way to watch NFSv4 at work is to run NFSv4 while watching your network with a packet sniffer. Use Wireshark: it's widely available and has up-to-date support for NFSv4. Once again, your traffic doesn't have to be going over a "real" network for this to work; if your client and server are on the same machine, just sniff the loopback interface ("lo"). | The best way to watch NFSv4 at work is to run NFSv4 while watching your network with a packet sniffer. Use Wireshark: it's widely available and has up-to-date support for NFSv4. Once again, your traffic doesn't have to be going over a "real" network for this to work; if your client and server are on the same machine, just sniff the loopback interface ("lo"). | ||
Line 29: | Line 29: | ||
# get linus's mainline tree: | # get linus's mainline tree: | ||
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git | git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git | ||
- | # if you | + | # if you want to see in-progress kernel work from other developers, you can add them with for example: |
git remote add -f trond git://linux-nfs.org/pub/linux/nfs-2.6.git | git remote add -f trond git://linux-nfs.org/pub/linux/nfs-2.6.git | ||
git remote add -f bfields git://linux-nfs.org/~bfields/linux.git | git remote add -f bfields git://linux-nfs.org/~bfields/linux.git | ||
- | |||
Then you can check out different versions with: | Then you can check out different versions with: | ||
Line 46: | Line 45: | ||
git fetch trond | git fetch trond | ||
git fetch bfields | git fetch bfields | ||
- | |||
(But note this does not affect your working directory; if you want to see what's new on some branch, you'll need to run "git checkout" again.) | (But note this does not affect your working directory; if you want to see what's new on some branch, you'll need to run "git checkout" again.) | ||
Line 52: | Line 50: | ||
"git grep" is useful for finding your way around, but you may also want to set up a good text editor integrated with a database of code cross-references. I use cscope and vim. The [http://cscope.sourceforge.net/ cscope home page] has instructions on using cscope with vim and emacs, and instructions on using cscope on a large project like the kernel without waiting forever for the indices to build. This allows you to follow the flow of control easily by popping quickly from the use of a function to its definition and back. | "git grep" is useful for finding your way around, but you may also want to set up a good text editor integrated with a database of code cross-references. I use cscope and vim. The [http://cscope.sourceforge.net/ cscope home page] has instructions on using cscope with vim and emacs, and instructions on using cscope on a large project like the kernel without waiting forever for the indices to build. This allows you to follow the flow of control easily by popping quickly from the use of a function to its definition and back. | ||
- | Take notes. As an example, I keep some [http://www.fieldses.org/~bfields/kernel/ | + | Take notes. As an example, I keep some [http://www.fieldses.org/~bfields/kernel/ outdated notes on the kernel]. In many cases they're too rough to be of use to someone else, but they help me organize my thoughts while I'm learning something new. |
It's easy to get lead astray if one attempts to understand large subsystems all at once. Instead, try to keep in mind one small goal (e.g., to fix a bug, to learn how to use a certain interface). | It's easy to get lead astray if one attempts to understand large subsystems all at once. Instead, try to keep in mind one small goal (e.g., to fix a bug, to learn how to use a certain interface). | ||
Line 62: | Line 60: | ||
You can use the rpcdebug command (included in nfs-utils) to get additional debugging information dumped in your logs. | You can use the rpcdebug command (included in nfs-utils) to get additional debugging information dumped in your logs. | ||
- | (To see the code that produces this, see include/linux/sunrpc/debug.h, include/linux/nfs_fs.h, include/linux/nfsd/debug.h, the NFSDDBG_FACILITY defines at the top of each .c file, and the dprintk()'s sprinkled throughout. | + | (To see the code that produces this, see include/linux/sunrpc/debug.h, include/linux/nfs_fs.h, include/linux/nfsd/debug.h, the NFSDDBG_FACILITY defines at the top of each .c file, and the dprintk()'s sprinkled throughout.) |
- | + | ||
- | + | Increasingly, though, we're moving away from dprintk()'s and towards tracing. XXX: add a pointer to tracing tutorials, and an example of use of nfs tracepoints. | |
- | + | = Contributing = | |
- | + | ||
- | + | ||
- | See | + | See the [http://www.kernel.org/pub/software/scm/git/docs/tutorial.html tutorial] and [http://www.kernel.org/pub/software/scm/git/docs/user-manual.html user's manual] for an introduction to git. |
- | + | Generally, though, git pull requests are only used when a maintainer needs to batch up a huge number of changes to submit to Linus or another maintainer. | |
- | + | Most of the time when we need to make a change, we do it by mailing a patch (or a patch series). See Documentation/SubmittingPatches in your friendly local kernel tree. | |
+ | |||
+ | Patches are the basic unit of communication with other kernel hackers. They should be readable by humans, not just by the patch command. | ||
+ | |||
+ | Often if you work on changes in git for a while, the history of your work will be messy, with some dead ends, errors in early patches that are fixed in later patches, and overly complicated patches that attempt to do too many things at once. So, usually we end up needing to rewrite the history to present it in a readable form. | ||
+ | |||
+ | The chapter [http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#cleaning-up-history Rewriting history and maintaining patch series] may be useful. | ||
+ | |||
+ | The goal is: | ||
+ | |||
+ | * Each patch should be short and do only one thing. This may mean that, after writing a bunch of new code to implement a new feature, you need to spend some time breaking up the code into smaller patches which introduce the new feature in easier-to-understand chunks. | ||
+ | * Each patch should have a changelog which explains *why* you are making the change (not just what the change is). | ||
+ | * When dealing with a long series of patches, no individual patch should introduce a compile-time or run-time regression. It should be possible to apply any inital part of the series, and still end up with a kernel that builds and runs at least as well as the unpatched kernel. | ||
= Mailing Lists = | = Mailing Lists = | ||
Line 88: | Line 95: | ||
Use the -s option to make; this eliminates most of its output so that you can see (potentially important) compiler warnings more easily. | Use the -s option to make; this eliminates most of its output so that you can see (potentially important) compiler warnings more easily. | ||
+ | |||
+ | |||
+ | == GDB Tricks == | ||
+ | |||
+ | When you encounter an OOPS, you'll get a Call trace like: | ||
+ | |||
+ | Call Trace: | ||
+ | [<ffffffff8110a3ef>] kfree+0x63/0xfc | ||
+ | [<ffffffffa011e18e>] nfs_free_parsed_mount_data+0x24/0xc1 [nfs] | ||
+ | [<ffffffffa0121743>] nfs_fs_mount+0x5ac/0x61c [nfs] | ||
+ | [<ffffffff81116a8d>] mount_fs+0x69/0x158 | ||
+ | [<ffffffff810ea5ad>] ? __alloc_percpu+0x10/0x12 | ||
+ | [<ffffffff8112b8a8>] vfs_kern_mount+0x65/0xc4 | ||
+ | [<ffffffff8112bf07>] do_kern_mount+0x4d/0xdf | ||
+ | [<ffffffff8112d6b7>] do_mount+0x64b/0x6af | ||
+ | [<ffffffff8112cfd0>] ? copy_mount_options+0xcb/0x12e | ||
+ | [<ffffffff8112d81e>] sys_mount+0x88/0xc2 | ||
+ | [<ffffffff81426be9>] system_call_fastpath+0x16/0x1b | ||
+ | |||
+ | There is a simple way to translate a <symbol>/<offset> to a line number: | ||
+ | |||
+ | * run gdb passing the correct kernel module (.ko) as the only argument | ||
+ | * use the "l * (<symbol>+<offset>)" command | ||
+ | |||
+ | |||
+ | $ gdb obj/fs/nfs/nfs.ko | ||
+ | |||
+ | ... | ||
+ | |||
+ | (gdb) l * (nfs_free_parsed_mount_data+0x24) | ||
+ | 0x918e is in nfs_free_parsed_mount_data (/home/dros/build/src/fs/nfs/super.c:924). | ||
+ | 919 | ||
+ | 920 static void nfs_free_parsed_mount_data(struct nfs_parsed_mount_data *data) | ||
+ | 921 { | ||
+ | 922 if (data) { | ||
+ | 923 kfree(data->client_address); | ||
+ | 924 kfree(data->mount_server.hostname); | ||
+ | 925 kfree(data->nfs_server.export_path); | ||
+ | 926 kfree(data->nfs_server.hostname); | ||
+ | 927 kfree(data->fscache_uniq); | ||
+ | 928 security_free_mnt_opts(&data->lsm_opts); |
Latest revision as of 19:44, 15 December 2020
This is an attempt to provide pointers to the basic information necessary to start hacking the Linux NFS implementation. I assume that you know C and know the basics of administering a Linux box (so I assume, for example, that you know how to build and install a new kernel). I don't assume a knowledge of kernel internals.
Contents |
setting up NFS
NFS is easy to set up and use; follow instructions for your distribution and play around a bit.
understanding NFSv4
The authoritative sources are RFC 7530 (NFSv4.0), RFC 5881 (NFSv4.1), and related documents from the NFSv4 IETF working group. Don't read them from start to finish! They're too long. But keep them on hand to refer to when you need to understand something specific.
The best way to watch NFSv4 at work is to run NFSv4 while watching your network with a packet sniffer. Use Wireshark: it's widely available and has up-to-date support for NFSv4. Once again, your traffic doesn't have to be going over a "real" network for this to work; if your client and server are on the same machine, just sniff the loopback interface ("lo").
Wireshark also has a companion program, tshark, with a text-only interface.
I usually adjust the Wireshark preferences to give the "Packet Details" panel the full height of the window. You may also need to set:
Protocols->TCP->"Allow subdissector to desegment TCP streams" Protocols->IP ->"Reassemble fragmented IP datagrams" Protocols->RPC->"Desegment all RPC-over-TCP messages" Protocols->RPC->"Defragment all RPC-over-TCP messages"
In addition to providing filters in the capture dialog or (with the -f option) on the commandline, Wireshark also gives you some help constructing filters after-the-fact: right-click on an element in the middle pane and play around with the "prepare" and "match" menus. One additional hint: right-clicking on an element and then choosing "expand tree" recursively expands everything in that element.
reading kernel code
The best way to understand how some part of the kernel works is usually just to read the code.
The best way to get the code is to install git, then run:
# get linus's mainline tree: git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git # if you want to see in-progress kernel work from other developers, you can add them with for example: git remote add -f trond git://linux-nfs.org/pub/linux/nfs-2.6.git git remote add -f bfields git://linux-nfs.org/~bfields/linux.git
Then you can check out different versions with:
git checkout v2.6.25 git checkout v2.6.26-rc1 git remote show trond # what branches does Trond have? git checkout trond/devel # Checkout the tip of Trond's "devel" branch
and download new updates with:
git fetch origin git fetch trond git fetch bfields
(But note this does not affect your working directory; if you want to see what's new on some branch, you'll need to run "git checkout" again.)
"git grep" is useful for finding your way around, but you may also want to set up a good text editor integrated with a database of code cross-references. I use cscope and vim. The cscope home page has instructions on using cscope with vim and emacs, and instructions on using cscope on a large project like the kernel without waiting forever for the indices to build. This allows you to follow the flow of control easily by popping quickly from the use of a function to its definition and back.
Take notes. As an example, I keep some outdated notes on the kernel. In many cases they're too rough to be of use to someone else, but they help me organize my thoughts while I'm learning something new.
It's easy to get lead astray if one attempts to understand large subsystems all at once. Instead, try to keep in mind one small goal (e.g., to fix a bug, to learn how to use a certain interface).
Robert Love's "Linux Kernel Development" gives a good overview if read side-by-side with the kernel code. "Linux Device Drivers" is also good, as is "Understanding the Linux Kernel". See also lwn.net's kernel coverage.
NFS Debugging
You can use the rpcdebug command (included in nfs-utils) to get additional debugging information dumped in your logs.
(To see the code that produces this, see include/linux/sunrpc/debug.h, include/linux/nfs_fs.h, include/linux/nfsd/debug.h, the NFSDDBG_FACILITY defines at the top of each .c file, and the dprintk()'s sprinkled throughout.)
Increasingly, though, we're moving away from dprintk()'s and towards tracing. XXX: add a pointer to tracing tutorials, and an example of use of nfs tracepoints.
Contributing
See the tutorial and user's manual for an introduction to git.
Generally, though, git pull requests are only used when a maintainer needs to batch up a huge number of changes to submit to Linus or another maintainer.
Most of the time when we need to make a change, we do it by mailing a patch (or a patch series). See Documentation/SubmittingPatches in your friendly local kernel tree.
Patches are the basic unit of communication with other kernel hackers. They should be readable by humans, not just by the patch command.
Often if you work on changes in git for a while, the history of your work will be messy, with some dead ends, errors in early patches that are fixed in later patches, and overly complicated patches that attempt to do too many things at once. So, usually we end up needing to rewrite the history to present it in a readable form.
The chapter Rewriting history and maintaining patch series may be useful.
The goal is:
- Each patch should be short and do only one thing. This may mean that, after writing a bunch of new code to implement a new feature, you need to spend some time breaking up the code into smaller patches which introduce the new feature in easier-to-understand chunks.
- Each patch should have a changelog which explains *why* you are making the change (not just what the change is).
- When dealing with a long series of patches, no individual patch should introduce a compile-time or run-time regression. It should be possible to apply any inital part of the series, and still end up with a kernel that builds and runs at least as well as the unpatched kernel.
Mailing Lists
It's helpful to at least skim the following mailing lists:
- linux-kernel@vger.kernel.org: traffic is several hundred posts a day, so don't subscribe unless you have some filtering set up to help you cope.
- linux-nfs@vger.kernel.org: nfs/linux stuff.
- nfsv4@ietf.org: ietf NFSv4 working group.
Miscellaneous
Use the -s option to make; this eliminates most of its output so that you can see (potentially important) compiler warnings more easily.
GDB Tricks
When you encounter an OOPS, you'll get a Call trace like:
Call Trace:
[<ffffffff8110a3ef>] kfree+0x63/0xfc [<ffffffffa011e18e>] nfs_free_parsed_mount_data+0x24/0xc1 [nfs] [<ffffffffa0121743>] nfs_fs_mount+0x5ac/0x61c [nfs] [<ffffffff81116a8d>] mount_fs+0x69/0x158 [<ffffffff810ea5ad>] ? __alloc_percpu+0x10/0x12 [<ffffffff8112b8a8>] vfs_kern_mount+0x65/0xc4 [<ffffffff8112bf07>] do_kern_mount+0x4d/0xdf [<ffffffff8112d6b7>] do_mount+0x64b/0x6af [<ffffffff8112cfd0>] ? copy_mount_options+0xcb/0x12e [<ffffffff8112d81e>] sys_mount+0x88/0xc2 [<ffffffff81426be9>] system_call_fastpath+0x16/0x1b
There is a simple way to translate a <symbol>/<offset> to a line number:
- run gdb passing the correct kernel module (.ko) as the only argument
- use the "l * (<symbol>+<offset>)" command
$ gdb obj/fs/nfs/nfs.ko
...
(gdb) l * (nfs_free_parsed_mount_data+0x24) 0x918e is in nfs_free_parsed_mount_data (/home/dros/build/src/fs/nfs/super.c:924). 919 920 static void nfs_free_parsed_mount_data(struct nfs_parsed_mount_data *data) 921 { 922 if (data) { 923 kfree(data->client_address); 924 kfree(data->mount_server.hostname); 925 kfree(data->nfs_server.export_path); 926 kfree(data->nfs_server.hostname); 927 kfree(data->fscache_uniq); 928 security_free_mnt_opts(&data->lsm_opts);