Previous | Contents | Next

Chapter 9: Troubleshooting

9.1 Debugging

Preliminary meanings of Debug option settings are:

To combine that settings, add them (i.e. 7 enables all messages and log flushing)

Getting HTTP headers from apt-get works like this:

apt-get update -o Debug::Acquire::Http=true

9.2 Problem: keeps delivering damaged files

Even in this millennium, sometimes damaged files are downloaded from the server and are stored in the cache. Sometimes lazy maintainers of 3rd party archives replace package files with the same name but different contents. Sometimes the server's file system gets corrupted without detection by the OS.

Anyhow, there might be cases where cached data becomes invalid. Volatile files might be replaced by fixed version on some future download but static package files are never changed upon completion and even incomplete downloads are resumed and keep bad data downloaded before.

Usually the damage is only discovered by the client later. The particular file can be located in the cache and replaced manually. And if there are many of them, a mass file check might be needed to clean the mess. Fortunately, there are helpers in cache maintenance interface to automate this process.

To start, visit the web control interface and check the options of Expiration task. Enable the check for explicit paths and the check of data contents, then start the expiration. With this parameters, complete files with incorrect checksum are detected. The default action for such files is adding them to a list of damage files. After that, the "Delete damaged files" button in the main web page can be used to remove them (or the Show button to display them first). Alternatively, the checkboxes appearing aside of each damage detection can be used together with the control buttons which appear at the end of the report. And another way of dealing with them is truncating (setting to zero size). This can be done on-the-fly and is enabled by the expiration parameters, or with the appropriate command button in the web interface.

NOTE: several index files and related support files can create false positives, i.e. as incomplete or bad files. This usually happens because their volatile contents has changed but the file was not downloaded for a while and another version of it was used instead (like bzip2-compressed instead of gzip-compressed or uncompressed). The default code attempts to detect files with good reasons to stay in the cache and does not mark them as damaged.

9.3 Problem: regular expiration action reproducibly aborts

A quick investigation of action logs should help identifying the problem. A typical one is a mirror listed somewhere which is not reachable when expiration runs.

Unfortunately there is no simple and safe way to solve this. One method is setting the ExAbortOnProblems configuration variable, but this can destroy the whole cache if a bigger problem with index file occurs and this state remains unnoticed for many days until ExTreshold period (see configuration) is over.

Another way is listing the index files of the faulty mirrors to a special file. It needs to be stored as "ignore_list" in the configuration directory and store one path name per line with paths relative to the cache directory, as seen in the error messages.

9.4 Problem: cacher suddenly terminates, log reports IO errors

For simplicity and memory saving reasons, apt-cacher-ng assumes that some files can be opened exclusively for reading and they don't suddenly become unreadable. Unfortunatelly, in some conditions and in case of IO errors, this doesn't work and the file systems sends a fatal signal and eventually terminates the program.

To track this down to the likely reasons, it's possible to execute a custom script in the moment of the reception of the fatal signal. To do that, add something like this to the configuration:

BusAction = ls -l /proc/$PPID/ | mail -s SIGBUS! root

This command would send a list of opened files with paths to the "root" mail user, and the root shall check the state of those files (maybe run them through md5sum or similar).

NOTE: Beware of the security implications of this configuration option. It runs regular shell code in context of the daemon user in a blocking way, so it may be vulnerable to symlink attacks and it will delay the automatic restart of the daemon through systemd.

9.5 Problem: download fails with 503 ... status message

Code 503 usually represents an internal failure which could not be described correctly by other HTTP status codes. In the most cases it's caused by file system errors or incorrect cache directory setup, like files or directories with incorrect owner, missing write/read permissions for the effective user account or other system related exceptions like running out of disk space.

The log file apt-cacher.err located in the LogDir directory should document more details. In case it doesn't, setting the Debug config option to a higher value might reveal more information.

Fixing permission problems shouldn't be a real challenge for system administrators. Usually, a command set like this should do the trick on Debian/Ubuntu systems, assuming that all group users should receive write access to the cache files:

 chown -R apt-cacher-ng:apt-cacher-ng /var/cache/apt-cacher-ng
 chmod -R a+rX,g+rw,u+rw /var/cache/apt-cacher-ng

9.6 Problem: apt-get freezes when downloading files

Solution: First, check:

If nothing helps then you may have hit a spooky problem which is hard to track down. If you like, help the author on problem identification. To do that, do:

  su -
  # enter root password
  cd /tmp
  apt-get source apt-cacher-ng
  apt-get build-dep apt-cacher-ng
  cd apt-cacher-ng-*
  ./build.sh DEBUG
  /etc/init.d/apt-cacher-ng stop
  builddir/apt-cacher-ng -c /etc/apt-cacher-ng logdir=/tmp foreground=1 debug=7
  # (let apt-get run now, on timeouts just wait >> 20 seconds)
  # stop the daemon with Ctrl-C
  /etc/init.d/apt-cacher-ng start
  # compress /tmp/apt-cacher.err and send it to author
  chown -R apt-cacher-ng:apt-cacher-ng /var/cache/apt-cacher-ng

The value of debug can be varied to have different verbosity (see section 9.1 for more information about Debug levels).

9.7 apt-get reports corrupted bzip2 data

Symptoms: apt-get fails to run through "update" no matter what you do. And you may have get a message like this one.

99% [6 Packages bzip2 0] [Waiting for headers] [Waiting for headers]
bzip2: Data integrity error when decompressing.
        Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

Err http://debian.netcologne.de unstable/main Packages              
  Sub-process bzip2 returned an error code (2)

9.8 Problem: apt-cacher-ng refuses to start with "Address already in use"

Another service is already listening on the port which apt-cacher-ng is configured to use. This might be the apt-cacher daemon which used the same port number by default. To identify the daemon behind that process, use the fuser utility, executing it as root for IPv4 and IPv6 protocol versions. Example:

fuser -4 -v -n tcp 3142
fuser -6 -v -n tcp 3142
                    USER        PID ACCESS COMMAND
3142/tcp:           xwwwfsd   17914 F....  xwwwfsd

(where 3142 is the port number from the apt-cacher-ng configuration file). To resolve the collision, reconfigure the other daemon or apt-cacher-ng to use another free port (and reconfigure the clients to use the new apt-cacher-ng port accordingly).


Comments to blade@debian.org
[Eduard Bloch, Sun, 19 Apr 2015 10:25:49 +0200]