GFS2 Setup Notes

From Linux NFS

Jump to: navigation, search


Initial install


Started with fresh installs of RHEL5.0 on 4 nodes of mixed hardware, all attached to a shared MSA-1000 fibre channel 8-disk array (in two sets of 4, ~550GB total).

  • installed cluster and update RPMs from wendy cheng:
    • cman-2.0.64-1.el5.x86_64.rpm
    • cman-devel-2.0.64-1.el5.x86_64.rpm
    • device-mapper-1.02.13-1.el5.x86_64.rpm
    • gfs-utils-0.1.11-3.el5.x86_64.rpm
    • gfs2-utils-0.1.25-1.el5.x86_64.rpm
    • gnbd-1.1.5-1.el5.x86_64.rpm (unused?)
    • kmod-gfs-0.1.16-
    • kmod-gnbd-0.1.3- (unused?)
    • lvm2-2.02.16-3.el5.x86_64.rpm
    • lvm2-cluster-2.02.16-3.el5.x86_64.rpm
    • openais-0.80.2-1.el5.x86_64.rpm
    • openais-devel-0.80.2-1.el5.x86_64.rpm
    • system-config-cluster-1.0.39-1.0.noarch.rpm (just a python frontend for several vg*, lv*, and pv* commands)

Configuring cman and clvmd

  • cman: at first I tried using system-config-cluster to set up cman, but given that I didn't have any complicated fencing or quorum-related needs, I basically just took a generic cluster.conf and edited it. My cluster.conf is real basic and has manual fencing set up to be a no-op (I'd get complaints from the daemons if I didn't have any fencing setup).
    • distribute the new cluster.conf to all nodes; on the first run, you can just use scp or whatever.
    • once the cluster's up, though, propagating and setting changes on all nodes takes two steps. From the node with the updated configuration, do:
      • $ sudo ccs_tool update /path/to/new/cluster.conf (pushes to all nodes listed in conf file)
      • $ sudo cman_tool version -r <new-version-number> (a generation number to keep the nodes synched)
  • clvmd: as before, I tried using system-config-lvm to set up clvmd, but it's not quite "there yet" -- it'd get wedged or go blind to clustered volumes at strange times. Again, tweaking a mostly-templated (and very well-commented) stock conf file wasn't hard; my lvm.conf is real simple. Note: btw, in my setup the MSA-1000 disk array is initially set up to do raid0 on the 8 disks in two groups of 4; my machines see 2 block devices, each with a capacity of ~270GB.
    • create 1 physical linux (0x83) partition each, using whole "disk"; repeat for /dev/sdc
      • $ sudo fdisk /dev/sdb
    • create physical volumes with LVM2 metadata
      • $ sudo pvcreate -M 2 /dev/sdb1
      • $ sudo pvcreate -M 2 /dev/sdc1
    • create a clustered volume group and add /dev/sdb1 to it
      • $ sudo vgcreate -M 2 -l 256 -p 256 -s 4m -c y VolGroupCluster /dev/sdb1
      • $ sudo pvscan # (verify it worked)
    • edit lvm.conf and make sure that "locking_type" is set to 3 (DLM).
    • distribute lvm.conf to all the nodes
    • start up both cman and clvmd everywhere. Note: fwiw, I use pdsh, the parallel distributed shell, to communicate to all nodes at once; I have mine use ssh for transport. E.g., from my .bashrc:
      • $ alias start-cluster='for svc in cman clvmd ; do pdsh -w node[1-4] sudo service $svc start; done'
    • add /dev/sdc1 to the existing volume group (needs the daemons running)
      • $ sudo vgextend VolGroupCluster /dev/sdc1
      • $ sudo vgs # (verify that the "clustering" flag is set on the volgroup)
    • create a logical volume using the whole volgroup
      • $ sudo lvcreate -n ClusterVolume -l 138924 VolGroupCluster
      • $ sudo lvdisplay -c -a # (verify that it worked)
    • create a GFS2 filesystem therein
      • $ sudo gfs2_mkfs -j 4 -p lock_dlm -t GFS2_Cluster:ClusterFS -O /dev/VolGroupCluster/ClusterVolume
    • edit /etc/fstab to add a mountpoint, restart the daemons, and mount!

Custom kernels

Once the basics were going, I built some kernels and things more or less worked -- except I had a heck of a time getting the Qlogic firmware to load properly. I'm fine with building the initcramfs "initrds" by hand, but for the firmware in this setup; I don't know, I guess I'm a udev idiot or something. What I ended up doing was bogarting a vendor patch from Red Hat (bless their hearts ;) that side-stepped the issue and just built the blobs into the GFS kernel module. A slightly-updated version against is available.

Upgrading GFS2 userland for kernels >2.6.18

Not too long after the initial install (which came with a 2.6.18-based kernel), I found that the in-kernel DLM (distributed lock manager) stuff changed recently and required a corresponding update to userspace LVM2 (logical volume manager) tools.

While Wendy Cheng had gotten things off the ground by giving me the bag of RPMs, we didn't get any RHN entitlements, so no updates = pain in the neck. I did finally manage to find a way to sneak RHEL5 packages out of RHN despite the lack of entitlement, but I had to do it by hand and I had to re-login for each package. Worse, when I finally did get the newest RPMs, they weren't even new enough anyway. Lesson learned: build from source.

I wasn't sure that it was the best idea, but since I already had GFS2 working with the stock userland, I was skittish and didn't want to clobber the system RPMs so I installed under my home directory; worked fine.

  • export CLUSTER=/home/richterd/projects/nfs/CLUSTER; cd $CLUSTER
  • mkdir device-mapper-OBJ cluster-OBJ LVM2-OBJ
  • device-mapper:
    • ./configure --prefix=$CLUSTER/device-mapper-OBJ && make && sudo make install
    • add $CLUSTER/device-mapper-OBJ/lib to /etc/ and rerun ldconfig
  • openAIS:
    • edit the Makefile; set DESTDIR to the empty string
    • make && sudo make install -- at some point, this clobbered some of the RPM stuff; meh.
    • added /usr/lib64/openais to and reran ldconfig
    • update: when i came back to this and was building on Fedora 9, i got complaints about struct ucred not being defined (see this bugreport). I edited $OPENAIS/exec/Makefile and added -D_GNU_SOURCE to its CFLAGS and things seem copacetic.
  • libvolume_id-devel:
    • sudo rpm -ivh libvolume_id-devel-095-14.5.el5.x86_64.rpm
  • cluster tools:
    • ./configure --prefix=$CLUSTER/cluster-OBJ --openaislibdir=/usr/lib64/openais --dlmincdir=/lib/modules/<kernel>/source/include
      • update: am now omitting a couple things (gfs doesn't build right any longer anyway)./configure --without_gnbd --without_gfs
    • edit dlm/lib/Makefile and add: CFLAGS += -I$(dlmincdir)
      • update: on a different install, libdlm.h kept hiding. edited make/ and add qc like CFLAGS += -I$(SRCDIR)/dlm/lib.
    • since I was doing my "trial" install, I added $CLUSTER/cluster-OBJ/usr/lib to and reran ldconfig. I anticipate going back and installing things in real system locations now that I know things worked :)
    • make && sudo make install
  • LVM2:
    • ./configure --prefix=$CLUSTER/LVM2-OBJ --with-lvm1=none --with-dmdir=$CLUSTER/device-mapper-OBJ --with-clvmd=cman
    • edit make.tmpl and look for where the above dmdir is set; my configure screwed up and appended "/ioctl" to the end and I had to trim it.
      • fix: rather, first trim from, where it originates for whatever reason
    • make && sudo make install

.. at this point, I had a clvmd that linked against the right shared libraries and that could deal with the kernel's modified DLM setup.

Troubleshooting the clustering flag

Problem 1: LVM changes don't appear to "take". Quoting from an email I found online (XXX: cite):

 Why aren't changes to my logical volume being picked up by the rest of the cluster?
 There's a little-known "clustering" flag for volume groups that should be set on when a cluster uses a shared volume. 
 If that bit is not set, you can see strange lvm problems on your cluster. For example, if you extend a volume with 
 lvresize and gfs_grow, the other nodes in the cluster will not be informed of the resize, and will likely crash when 
 they try to access the volume.
 To check if the clustering flag is on for a volume group, use the "vgs" command and see if the "Attr" column shows 
 a "c". If the attr column shows something like "wz--n-" the clustering flag is off for the volume group. If the 
 "Attr" column shows something like "wz--nc" the clustering flag is on.
 To set the clustering flag on, use this command:  vgchange -cy

Problem 2: In the midst of adding a new node to the cluster, clvmd wouldn't start on other nodes and recognize the disk array.

I tried the above vgchange -cy thing and screwed it up by making the local disk's VG clustered (ugh). The problem made sense, but the temporarily-changing-the-locking-type was what I was missing when I tried to undo my mistake.

The fix: make sure uniform lvm.confs are tweaked as per the link above and distributed to the cluster; start cman/clvmd everywhere; then use vgchange -cn VolGroup00 to remove clustering flag (VolGroup00 is the local disk's VG, set up during the RHEL install); then set the lvm.conf locking stuff back to "clustered" and redistribute to the cluster; then restart the daemons, mount, declare victory.

Personal tools