July 19, 2008

Monitoring: Detecting Flaps

Monitoring software typically supports raising an alarm following a certain number of consecutive breaches: three unsuccessful HTTP requests to a web server, or two consecutive ping failures. However, this will miss conditions where the alarm state flaps. That is, the monitor enters then leaves the alarm state each consecutive check, or enough such that the required number of breaches is never met.

Given two connections, a ping check could be made on each every five minutes, with an alarm cutting after three connection failures. Here is an example over 40 minutes of monitoring:

Time Connection #1 Connection #2
0 true 0 true 0
5 true 0 false 1
10 false 1 true 0
15 false 2 false 1
20 false 3 true 0
25 false 4 false 1
30 false 5 true 0
35 false 6 false 1
40 true 0 true 0

An alarm is raised for connection #1 at 20 minutes due to three failures. However, connection #2 will never raise an alarm, as it never accumulates enough consecutive failures. Note that between the 5th and 35th minute inclusive, it tallied four errors to only three successes.

Experimental alarm raising code in the test-monitoring script reveals that delaying the alarm reset exposes flaps to an otherwise simple consecutive alarm detector. Using a running average to detect flapping seems too complicated.

$ perl test-monitoring conn#1 at 20 alarm on simple conn#1 at 20 alarm on simple_delayreset conn#1 at 25 alarm on simple conn#1 at 25 alarm on simple_delayreset conn#1 at 30 alarm on simple conn#1 at 30 alarm on simple_delayreset conn#1 at 35 alarm on simple conn#1 at 35 alarm on simple_delayreset conn#2 at 25 alarm on simple_delayreset conn#2 at 35 alarm on simple_delayreset

Without flap detection, monitoring will usually alarm minutes or hours into an incident, usually when a backlog or some other condition created by the flapping reaches alarm levels. Therefore, flapping detection should be added to network and other connectivity tests, where sporadic failures over time will cause flapping.

Update: T_____ pointed out Detection and Handling of State Flapping in Nagios 3.

July 06, 2008

Hard Linked Directories

Early Unix systems only support hard links, and lacked mkdir(2). This means directory creation used to involve three system calls: mknod(2), and then link(2) twice to create the special . and .. directory entries. Worse, these calls are not atomic, leading to race conditions. Modern Unix supports the atomic mkdir(2) call, though may still allow ln(1) to hard link directories.

ln(1) on OpenBSD claims the inability to hard link directories, and the manual shows no options to influence this restriction:

$ mktemp -d test.XXXXXXX test.LR31932 $ ln test.LR31932 dirhl ln: test.LR31932: is a directory $ sudo ln test.LR31932 dirhl ln: test.LR31932: is a directory $ rmdir test.LR31932 $ man ln | perl -00 -ne 's/\010.//g' \ -e 'print if m/^\s+-\w/' -f Unlink any already existing file, permitting the link to occur. -h If the target is a symlink to a directory, do not descend into it. -n An alias for -h for compatibility with other operating systems. -s Create a symbolic link.

However, a search on the error message reveals an undocumented option to ln(1):

$ find /usr/src -type f -follow | \ xargs grep -l 'is a directory' 2>/dev/null | \ grep ln /usr/src/bin/ln/ln.c $ grep -3 'is a directory' /usr/src/bin/ln/ln.c } /* Only symbolic links to directories, unless -F option used. */ if (!dirflag && S_ISDIR(sb.st_mode)) { warnx("%s: is a directory", target); return (1); } …

In addition to -F, there is also a kernel limitation against hard linking directories, as sudo ln -F … results in an Operation not permitted error. This is likely for the best, as find(1) and du(1) type programs might not handle a cyclical directory tree well.

Uresh Vahalia’s UNIX Internals: The New Frontiers covers hard links and filesystems in excellent detail.

Technorati Tags:

June 24, 2008

North Facing Windows

In my continued reading of the most excellent Light: Science and Magic: An Introduction to Photographic Lighting book, and continued lack of lighting equipment beyond a SB-600 and various household lights of annoyingly different color temperatures, North facing windows provide excellent lighting, reproducing the effect of a large softbox.

  • North facing window, plus diffuse reflection off the white table, plus random background light from the rest of the room:

    花の座

  • A different set of North facing windows, plus diffuse reflection off various pieces of white paper that fill in shadows opposite the windows. This lighting is best revealed by inspecting the helmet in the lower right. The white paper provided perhaps too much reflection, as the lemon and helmets are a bit too bright; grey paper might have worked better (coat cardboard in tinfoil if more light is required).

    Breakfast

  • For comparison, mostly direct morning sunlight (harder shadows, sharper highlights) through an East facing window. One shot backlit by an opened moleskin; otherwise, the pictures share the same exposure and B&W conversion:

    Espresso-Cup1 Espresso-Cup2

Finding good lighting requires looking around, experimenting with moving things closer or away from the light source: what does the subject look like close to the window versus near it versus farther away? Does changing the camera position with respect to the subject at these different locations improve the image? Do reflected or other light sources help fill in shadows appropriately? Outside, the time of day, season, and weather might require months to line up just right…

June 22, 2008

Disk Space

$ df; df -i; sudo lsof | grep deleted

On unix systems, when disk space runs low, the first three commands to run are df, which shows how much space file contents occupy; df -i, whether available inodes have been consumed; and sudo lsof | grep deleted, what deleted files remain held open by a running process. Using this information, partitions can then be investigated for the source of the problem, or processes holding a large file open restarted.

Note that some file systems will never run out of inodes, as they use B+ tree or other solutions instead of a count set when the filesystem was created.

Monitoring

Disk space usage should be monitored with software such as Nagios. Alarming is best done at two levels: a non-paging ticket when disk space reaches warning levels, and a paging ticket when disk space is critically short. Using this model, runaway disk space use will quickly cause a page, while slow consumption of space can be acted on well before a page is necessary.

Different alarm levels must be set for different classes of systems: database servers might well run certain partitions up to 99% full under normal use, while other systems may require a warning ticket well before 90% usage.

Remediation

In my experience, logging causes the majority of disk space problems. Eliminate verbose and debug logging in production, as these messages hardly justify the disk space and I/O operations required. Where possible, eliminate stack trace spam: a remote connection failure warning never needs 100 lines saying where in the code the failure took place; a single line suffices, as the problem has nothing to do with the code.

Estimate disk space consumption: the amount of data logging per transaction or request in production should be known, and can be multiplied by the anticipated traffic levels, plus the duration logs must remain around, plus padding for growth and filesystem overhead. This should be done before the service launches, not afterwards!

Use proper log file handling, not problematic logrotate implementations. messages.1 to messages.2 rotation thwarts rsync style archival, and does not divide the logs into time-based buckets as better solutions do.

Technorati Tags: ,

June 12, 2008

Rollei Scanfilm CN 400

Rollei Scanfilm CN 400 is interesting:

Loading Dock

Holga, minus the square plastic insert. Film warping took place during either the photograph, scanning, or both. I like the results better than from comparable Fuji color films, and this film provides for different applications than super-saturated Velvia.