Next Previous Contents

14. Appendix B - Troubleshoot Problems

 
                T R O U B L E S H O O T I N G

If you run into any problem during installation or when using this
package, please first read the following text and all other relevant
documentation. Especially you should consult your server's documen-
tation if you run into problems setting up your server. Also refer
to your network card's user manual or the documentation for the
operating systems of the diskless clients accordingly. However, if
you still can't solve the problem on your own, you can send me an
email to

                gero@gkminix.han.de

Users able to speak German can send me the mail in german. Otherwise
please write in english. I already received some emails in so poor
english that I haven't been able to even understand the problem. I
can't help you in that case. And please excuse me that I can't answer
questions sent to me by standard mail or telephone calls. I just don't
have the time for dealing with that.
If you decided to send me an email please describe your problem as
exactly as possible. It usually helps to send me relevant portions
of configuration files (I have to pay for my internet access by myself
so please keep quotings as short as possible). Especially with problems
with the bootrom it usually helps to _exactly_ write down the screen
output, not only but including any error messages. Also state as exact
as possible how you created the problem so that I can try to simulate
it on my own hardware.
Additionally please note that I can't help you with every problem with
your server, as there are so many different systems on the market. The
same is true for problems with network cards. I just don't have the
financial capabilities to buy any card on the market for testing. Per-
sonally I'm using NE2000 and WD8013 cards, so I can probably help you
with those.
If you find a problem which looks like a bug in the code I really
appreciate a short notice from you. And if you have a fix for the bug
I would even more appreciate your message.
Besides contacting me directly there also exists a mailing list related
to network booting which you can subscribe to. Write a mail with the
message 'subscribe netboot' in it's body to majordomo@baghira.han.de
(the subject of the mail doesn't matter). The readers of the mailing
list should also be able to help you with any problem you might have
while setting up a diskless client. And besides that I'm also going
to announce any new version of this netboot package to the mailing
list.




Problem: My operating system OS/XY is not supported by netboot

        I would gladly provide support for every operating system on the
        market, but I don't have the resources for doing this. However,
        if you want a particular operating system to be supported, you
        should get in contact with me. In any case you will have to provide
        me with a valid and licensed copy of that operating system. You are
        also invited to write your own boot loader, and send it to me for
        inclusion into netboot under the terms of the GNU GPL.



Problem: While trying to build a bootrom I get a compiler error

        The installation scripts require to compile a couple of utility
        programs which are only required during building the bootrom.
        They should compile on any Unix-type system, so if you get an
        error please report it to me, even when you are able to fix it
        yourself, so that I can include a patch for future releases.



Problem: I get a an error from make saying something like "missing delimiter"

        Some of the Makefiles use ifdef's, which older make programs don't
        understand. Even some more "modern" systems like SCO Open-Server 5
        have this problem. In that case you will have to get and install GNU
        make on your system (which is the better choice anyway).



Problem: The bootrom doesn't startup at all

        Either you have a floppy in your diskette drive or you have
        a hard disk installed with a partition marked as active, and the
        bootrom has been built so that it lets the BIOS look for active
        partitions first. Both conditions let the system boot from the
        bootable media instead of using the bootrom. Just remove the
        floppy or use fdisk to mark all partitions as unbootable (e.g.
        inactive). Alternatively you can also build the bootrom so that
        it does not allow the BIOS to look for bootable partitions. The
        program which actually creates the bootrom ('makerom', it gets
        called when you run 'make bootrom') will ask you about this right
        after selecting the bootrom kernel image.



Problem: The bootrom behaves strange during startup, and may even hangup
         the whole system

        If you compiled the mknbi programs on a system with big endian
        byte order (like Motorola or PPC systems) this might indicate
        that the configuration program couldn't find the correct byte
        order. It might also be that there is a bug in the byte ordering
        code. Some systems like SPARCs also do not allow data accesses at
        misaligned addresses. 'configure' should usually find out about
        these conditions. In any case, if 'configure' is not able to pro-
        perly detect what kind of system you are using, edit the file
        config.h by hand and try it again. Please report this condition,
        and also note which system you used for installation.



Problem: The packet driver is not able to start properly

        First check what error message the packet driver prints. Usually
        this problem is a result of an incorrect setup of the network
        card, so check that it uses an I/O address, interrupt line and DMA
        channel (if applicable) of it's own, and that the packet driver
        uses the correct values. Another common problem with ethernet
        cards which use shared memory (like WD80?3 cards) is an overlap-
        ping of this shared memory with the rom area used by the bootrom.
        Select a different shared memory address in that case. If that's
        ok you should next check that you configured the packet driver
        correctly with the bootrom configuration program. Usually the
        packet driver prints out what it expects the hardware to look
        like so you can use this information to check up your setup.



Problem: The bootrom tells me that there is not enough memory but I have
         xx megabytes installed

        This problem is a result of the fact that the BIOS starts the
        bootrom in the processor's real mode. The bootrom is therefore
        only able to access the lower 1 megabyte of memory, regardless
        of how much you installed. And 384kB of this is reserved for
        ROM's and the video memory, so there is only 640kB left. Unfor-
        tunately some systems even reserve memory from these lower 640kB
        for internal BIOS data. This is called extended BIOS data area,
        and known to be used on most PS/2 systems. But also some other
        BIOSes use such an extended BIOS data area, which is usually
        selectable in the system's setup. Therefore you should try to
        deselect such a feature. If that's not possible you are out
        of luck - sorry.



Problem: The bootrom doesn't receive a bootp answer and just hangs printing
         dots

        First you should check if bootpd runs on your server or is started
        properly from inetd. Then check that the server's /etc/bootptab is
        setup correctly. Especially the hardware address and the client's
        IP address and name have to be correct. 
        Most bootp servers have the ability to write debugging information
        into a log file. Use that feature to verify that your server really
        receives bootp requests from the client's bootrom and sends out a
        valid answer. Also check for error messages in the log file. Even
        if your bootpd doesn't write into a seperate log file it might use
        syslog on your system, so find the log file name from your syslogd
        configuration file and check for errors.
        If you are able to use a network tracing program like tcpdump you
        can check if the bootrom sends out correct requests and that the
        server is answering correctly. In that case it is more likely to
        be a problem in the bootrom, so you should create a new bootrom
        image with the packet driver debugging module included. You should
        then see the bootrom's request packets going out, and the server's
        answers coming in. If there are no packets coming in although you
        verified that the server is sending out correct replies there might
        be a problem with your network card. Did you set it up correctly,
        is a cable connected (no kidding, those things really happen)?
        If everything fails try to boot the diskless client with the
        intended operating system and try to access the network card
        using that operating system's tools.
        If the server is not sending out answer packets, but the bootpd
        logfiles indicates correct answers, it might be a problem with
        the arp setup on your server. Normally arp shouldn't be a concern
        for you. However, some older versions of bootpd for Linux had
        problems here, which could be solved by setting the kernel arp
        table manually.



Problem: The bootrom did get a bootp answer but is not able to load the
         bootimage file

        This is likely to be a problem with the tftpd setup on the server.
        Does tftpd run when you startup the bootrom code? If not check
        that inetd is configured correctly. Also there might be a TCP/IP
        wrapper running on your server which might prohibit access to
        the tftp service (which is known to be very insecure and therefore
        a candidate for getting started by an internet security wrapper
        like tcpd). Check any access configuration files for tcpd.
        Furthermore tftpd has to be able to access the bootimage file. It
        usually runs as a user with very low priviliges because of security
        reasons and might not be allowed to read the bootimage file, so
        you should check and set the bootimage file's permissions correctly.



Problem: The boot image loader reports an error

        Congratulations! You just discovered a bug in the boot loader.
        Please report it to me.



Problem: When I'm using the bootrom menu to load a Unix system off the local
         hard disk, it reports some weird error messages to me (especially,
         SCO Unix says that it's not able to open boot device). However,
         booting without the bootrom works without a problem.

        Some operating systems, especially Unix like systems, read the
        partition table after booting and try to find their own boot par-
        tition. When using the bootrom, it's not necessary to mark the
        Unix partition as bootable, so the Unix startup loader fails.
        To solve this problem, mark the Unix partition active with some
        fdisk program. To avoid that it starts running instead of the
        bootrom, create the bootrom so that it does not allow the BIOS
        to search for boot partitions on the installed hard disks (the
        'makerom' program, which gets run when you do a 'make bootrom',
        will ask you about this right after selecting a kernel image).



Problem: I'm loading Linux onto my diskless client and the kernel tells
         me to insert a root floppy and press enter

        First you should check that you built your kernel correctly. It
        should have support for the root filesystem built in. If you want
        to use an NFS mounted directory as root the kernel should have
        TCP/IP support installed. Also it has to have a driver for your
        network card built in, and NFS and NFSROOT have to be both speci-
        fied. When using a ramdisk it's support has to be compiled in
        as well as support for the filesystem with which you formatted
        the ramdisk image. Please note that the loaded kernel is not
        able to use modules at bootup time (only _after_ the root file-
        system has been mounted, but not before), so everything has to
        be compiled in.

        If the kernel is not able mount it's root via NFS, this might
        have many different reasons. It requires all addresses in the
        /etc/bootptab file to be correct, and the access rights on the
        server have to be set correctly - not only in /etc/exports but
        also the permissions for the directory to get mounted. If that's
        correct check that a portmapper is running on the server, and
        that it registered the mountd and nfsd services correctly. You
        can usually do this by running the command

                        rpcinfo -p

        Note that services are only listed here if their associated server
        process is really running. The rpcinfo output should then look
        something like this:

                   program vers proto   port
                    100000    2   tcp    111  portmapper
                    100000    2   udp    111  portmapper
                    100003    2   udp   2049  nfs
                    100003    2   tcp   2049  nfs
                    100005    1   udp    663  mountd
                    100005    1   tcp    665  mountd

        However, the port numbers might be different.

        When the kernel starts mounting the NFS root directory it prints
        out the name of that directory on the server. It should be the
        same as the one configured in /etc/bootptab. Check that it's
        correct. If not you can try to use the -d option with mknbi-linux
        to specify the name explicitely.

        If the kernel gets an error from the server's nfsd, it prints
        a number which is defined according to the NFS protocol. The
        most commonly occurring numbers are:

                 1  -  permission denied to access directory
                 2  -  directory doesn't exist
                 5  -  I/O error on server filesystem
                13  -  nfsd is unable to access directory
                20  -  path name is not a directory
                63  -  path name is too long

        Note that some nfsd and mountd programs only read /etc/exports
        on startup. If you changed this file afterwards, you will have
        to restart both daemons. Additionally, with nfsd versions for
        Linux earlier than 2.1 you will have problems with special files
        like UNIX domain sockets or block/character special files on
        your NFS partitions. You should therefore use the latest avai-
        lable versions.



Problem: The Linux kernel mounts it's root correctly but doesn't give me
         a login prompt.

1.)     This might be the result of an incorrect setup of the root file-
        system (see No. 2 below). However, it's also possible that your
        server reported the wrong major/minor numbers for the console device
        even though you specified them correctly in the NFS mounted root
        directory. I know of this problem with AIX and HP-UX servers,
        but there might exist others as well which don't transfer special
        devices via NFS as Linux requires it. One solution to solve this
        problem is to boot the diskless client with a ramdisk image as
        it's root, and then mount the should-be-root directory on the
        server using NFS. Then you can create the special files in the
        dev directory using Linux's mknod program, and use the NFS root
        mounting bootimage again.
        Another way is to try to find out, how the server operating system
        encodes major/minor numbers on it's own filesystem. For example,
        HP-UX uses a 32 bit device number, with the 8 highest bits being
        the major number, and the lower 24 bits being the minor device
        number:

                major << 24 | minor   ==>   aaaaaaaabbbbbbbbbbbbbbbbbbbbbbbb

        In this representation (a) means a bit of the major number, and
        (b) means a bit of the minor number. Linux uses the following
        scheme instead:

                major << 8 | minor    ==>   0000000000000000aaaaaaaabbbbbbbb

        The NFS protocol now transfers these 32 bits just as they are,
        without any further interpretation regarding major/minor numbers.
        That means, that all relevant bits in the Linux representation
        fit into the minor number on HP-UX. Therefore, if you create a
        device on the HP-UX server, you have to alway give it a major
        number of zero and compute the minor number the way mentioned
        above for Linux. For example, to let Linux see a device 5/2 in
        it's NFS-mounted /dev directory, you can compute the minor device
        number on HP-UX as

                5 << 8 | 2    ==>  1282

        So the device to create on the HP-UX server is 0/1282. This will
        let Linux see 5/2 after the filesystem is mounted with NFS.

2.)     Another reason for this problem might be that the init process
        doesn't get started at all. This can be a result of incorrect
        shared libraries, which the client might see but without a proper
        ld.so.cache file. Or the shared libraries are not reachable by
        the client at all. Bruce Janson and Markus Gutschke collected a
        good list of possibilities, which you should check out:

                - you do not have a private copy of the /, /etc, /var, ...
                  directories

                - your /dev directory is missing entries for /dev/zero and/or
                  /dev/null or is sharing device entries from a server that uses
                  different major and minor numbers (i.e. a server that is not
                  running Linux - see above).

                - your /lib directory is missing libraries (most notably libc*
                  and/or libm*) or does not have the loader files ld*.so*

                - you neglected to run ldconfig to update /etc/ldconfig.cache
                  or you do not have a configuration file for ldconfig.

                - your /etc/inittab and/or /etc/rc.d/* files have not been
                  customized for the clients.

                - your kernel is missing some crucial compile-time feature
                  (such as NFS filesystem support, booting from the net, trans-
                  name (optional), ELF file support, networking support, driver
                  for your ethernet card).

                - missing init executable (in one of the directories
                  known by the kernel: /etc, /sbin, ?)

                - missing /etc/inittab

                - missing /dev/tty?

                - missing /bin/sh

                - system programs that insist on creating/writing to files
                  outside of /var (mount and /etc/mtab* is the canonical
                  example)



Problem: Can't compile the bootrom

        Please get in touch with me if you encounter any problems
        while recompiling the bootrom.

Next Previous Contents