There is a large body of good-practice traditions for open-source code that helps other people port, use, and cooperate with developing it. Some of these conventions are traditional in the Unix world and predate Linux; others have developed recently in response to particular new tools and technologies such as the World Wide Web.
This document will help you learn good practice. It is organized into topic sections, each containing a series of checklist items. Think of these as a pre-flight checklist for your distribution.
This document will be posted monthly to the newsgroups
comp.os.linux.answers
. The document is archived on a number
of Linux FTP sites, including metalab.unc.edu
in
pub/Linux/docs/HOWTO
.
You can also view the latest version of this HOWTO on the World Wide Web via the URL http://metalab.unc.edu/LDP/HOWTO/Software-Release-Practice.html.
Feel free to mail any questions or comments about this HOWTO to Eric S. Raymond, esr@snark.thyrsus.com.
As the load on maintainers of archives like Metalab, the PSA site and CPAN increases, there is an increasing trend for submissions to be processed partly or wholly by programs (rather than entirely by a human).
This makes it more important for project and archive-file names to fit regular patterns that computer programs can parse and understand.
It's helpful to everybody if your archive files all have GNU-like names -- all-lower-case alphanumeric stem prefix, followed by a dash, followed by a version number, extension, and other suffixes.
Let's suppose you have a project you call `foobar' at version 1, release 2, level 3. If it's got just one archive part (presumably the sources), here's what its names should look
The source archive
The LSM file (assuming you're submitting to Metalab).
Please don't use these:
This looks to many programs like an archive for a project called`foobar123' with no version number.
This looks to many programs like an archive for a project called `foobar1' at version 2.3.
Many programs think this goes with a project called `foobar-v1'.
The underscore is hard for people to speak, type, and remember
Unless you like looking like a marketing weenie. This is also hard for people to speak, type, and remember.
If you have to differentiate between source and binary archives, or between different kinds of binary, or express some kind of build option in the file name, please treat that as a file extension to go after the version number. That is, please do this:
sources
binaries, type not specified
ELF binaries
ELF binaries statically linked
SPARC binaries
Please don't use names like `foobar-ELF-1.2.3.tar.gz', because programs have a hard time telling type infixes (like `-ELF') from the stem.
A good general form of name has these parts in order:
Some projects and communities have well-defined conventions for names and version numbers that aren't necessarily compatible with the above advice. For instance, Apache modules are generally named like mod_foo, and have both their own version number and the version of Apache with which they work. Likewise, Perl modules have version numbers that can be treated as floating point numbers (e.g., you might see 1.303 rather than 1.3.3), and the distributions are generally named Foo-Bar-1.303.tar.gz for version 1.303 of module Foo::Bar.
Look for and respect the conventions of specialized communities and developers; for general use, follow the above guidelines.
The stem prefix should be common to all a project's files, and it should be easy to read, type, and remember. So please don't use underscores. And don't capitalize or BiCapitalize without extremely good reason -- it messes up the natural human-eyeball search order and looks like some marketing weenie trying to be clever.
It confuses people when two different projects have the same stem name. So try to check for collisions before your first release. A good place to check is the index file of Metalab.
The license you choose defines the social contract you wish to set up among your co-developers and users. The copyright you put on the software will function mainly as a legal assertion of your right to set license terms on the software and derivative works of the software.
Anything that is not public domain has a copyright, possibly more than one. Under the Berne Convention (which has been U.S. law since 1978), the copyright does not have to be explicit. That is, the authors of a work hold copyright even if there is no copyright notice.
Who counts as an author can be very complicated, especially for software that has been worked on by many hands. This is why licenses are important. By setting out the terms under which material can be used, they grant rights to the users that protect them from arbitrary actions by the copyright holders.
In proprietary software, the license terms are designed to protect the copyright. They're a way of granting a few rights to users while reserving as much legal territory is possible for the owner (the copyright holder). The copyright holder is very important, and the license logic so restrictive that the exact technicalities of the license terms are usually unimportant.
In open-source software, the situation is usually the exact opposite; the copyright exists to protect the license. The only rights the copyright holder always keeps are to enforce the license. Otherwise, only a few rights are reserved and most choices pass to the user. In particular, the copyright holder cannot change the terms on a copy you already have. Therefore, in open-source software the copyright holder is almost irrelevant -- but the license terms are very important.
Normally the copyright holder of a project is the current project leader or sponsoring organization. Transfer of the project to a new leader is often signaled by changing the copyright holder. However, this is not a hard and fast rule; many open-source projects have multiple copyright holders, and there is no instance on record of this leading to legal problems.
Some projects choose to assign copyright to the Free Software Foundation, on the theory that it has an interest in defending open source and lawyers available to do it.
For licensing purposes, we can distinguish several different kinds of rights that a license may convey. Rights to copy and redistribute, rights to use, rights to modify for personal use, and rights to redistribute modified copies. A license may restrict or attach conditions to any of these rights.
The Open Source Initiative is the result of a great deal of thought about what makes software ``open source'' or (in older terminology) ``free''. Its constraints on licensing require that:
The guidelines prohibit restrictions on redistribution of modified binaries; this meets the needs of software distributors, who need to be able to ship working code without encumbrance. It allows authors to require that modified sources be redistributed as pristine sources plus patches, thus establishing the author's intentions and an ``audit trail'' of any changes by others.
The OSD is the legal definition of the `OSI Certified Open Source' certification mark, and as good a definition of ``free software'' as anyone has ever come up with. All of the standard licenses (MIT, BSD, Artistic, and GPL/LGPL) meet it (though some, like GPL, have other restrictions which you should understand before choosing it).
Note that licenses which allow noncommercial use only do not qualify as open-source licenses, even if they are decorated with ``GPL'' or some other standard license. They discriminate against particular occupations, persons, and groups. They make life too complicated for CD-ROM distributors and others trying to spread open-source software commercially.
Here's how to translate the theory above into practice:
In some cases, if you have a sponsoring organization behind you with lawyers, you might wish to give copyright to that organization.
The Open Source Definition is the community gold standard for licenses. The OSD is not a license itself; rather, it defines a minimum set of rights that a license must guarantee in order to be considered an open-source license. The OSD, and supporting materials, may be found at the web site of the Open Source Initiative.
The widely-known OSD-conformant licenses have well-established interpretive traditions. Developers (and, to the extent they care, users) know what they imply, and have a reasonable take on the risks and tradeoffs they involve. Therefore, use one of the standard licenses carried on the OSI site if at all possible.
If you must write your own license, be sure to have it certified by OSI. This will avoid a lot of argument and overhead. Unless you've been through it, you have no idea how nasty a licensing flamewar can get; people become passionate because the licenses are regarded as almost-sacred covenants touching the core values of the open-source community.
Furthermore, the presence of an established interpretive tradition may prove important if your license is ever tested in court. At time of writing (late 1999) there is no case law either supporting or invalidating any open-source license. However, it is a legal doctrine (at least in the U.S., and probably in other common-law countries such as England and the rest of the British Commonwealth) that courts are supposed to interpret licenses and contracts according to the expectations and practices of the community in which they originated.
Most of these are concerned with ensuring portability, not only across Linuxes but to other Unixes as well. Being portable to other Unixes is not just a worthy form of professionalism and hackerly politeness, it's valuable insurance against future changes in Linux itself.
Finally, other people will try to build your code on non-Linux systems; portability minimizes the number of annoying perplexed email messages you will get.
For portability and stability, you should write either in ANSI C or a scripting language that is guaranteed portable because it has just one cross-platform implementation.
Scripting languages that qualify include Python, Perl, Tcl, and Emacs Lisp. Plain old shell does not qualify; there are too many different implementations with subtle idiosyncracies, and the shell environment is subject to disruption by user customizations such as shell aliases.
Java holds promise as a portable language, but the Linux-available implementations are still scratchy and poorly integrated with Linux. Java is still a bleeding-edge choice, though one likely to become more popular as it matures.
If you are writing C, do feel free to use the full ANSI features -- including function prototypes, which will help you spot cross-module inconsistancies. The old-style K&R compilers are history.
On the other hand, do not assume that GCC-specific features such as the `-pipe' option or nested functions are available. These will come around and bite you the second somebody ports to a non-Linux, non-GCC system.
If you're writing C, use autoconf/automake/autoheader to handle portability issues, do system-configuration probes, and tailor your makefiles. People building from sources today expect to be able to type "configure; make" and get a clean build -- and rightly so.
If you're writing C, test-compile with -Wall and clean up the errors at least once before each release. This catches a surprising number of errors. For real thoroughness, compile with -pedantic as well.
If you're writing Perl, check your code with perl -c (and maybe -T, if applicable). Use perl -w and 'use strict' religiously. (See the Perl documentation for discussion.)
Run a spell-checker on them. If you look like you can't spell and don't care, pleople will assume you code is sloppy and careless too.
These guidelines describe how your distribution should look when someone downloads, retrieves and unpacks it.
The single most annoying mistake newbie developers make is to build tarballs that unpack the files and directories in the distribution into the current directory, potentially stepping on files already located there. Never do this!
Instead, make sure your archive files all have a common directory part named after the project, so they will unpack into a single top-level directory directly beneath the current one.
Here's a makefile trick that, assuming your distribution directory is named `foobar' and SRC contains a list of your distribution files, accomplishes this. It requires GNU tar 1.13
VERS=1.0 foobar-$(VERS).tar.gz: tar --name-prefix='foobar-$(VERS)/' -czf foobar-$(VERS).tar.gz $(SRC)
If you have an older tar program, do something like this:
foobar-$(VERS).tar.gz: @ls $(SRC) | sed s:^:foobar-$(VERS)/: >MANIFEST @(cd ..; ln -s foobar foobar-$(VERS)) (cd ..; tar -czvf foobar/foobar-$(VERS).tar.gz `cat foobar/MANIFEST`) @(cd ..; rm foobar-$(VERS))
Have a file called README or READ.ME that is a roadmap of your source distribution. By ancient convention, this is the first file intrepid explorers will read after unpacking the source.
Good things to have in the README include:
Before even looking at the README, your intrepid explorer will have scanned the filenames in the top-level directory of your unpacked distribution. Those names can themselves convey information. By adhering to certain standard naming practices, you can give the explorer valuable clues about what to look in next.
Here are some standard top-level file names and what they mean. Not every distribution needs all of these.
the roadmap file, to be read first
configuration, build, and installation instructions
list of project contributers
recent project news
project history
project license terms (GNU convention)
project license terms
list of files in the distribution
plain-text Frequently-Asked-Questions document for the project
generated tag file for use by Emacs or vi
Note the overall convention that filenames with all-caps names are human-readable metainformation about the package, rather than build components.
Having a FAQ can save you a lot of grief. When a question about the project comes up often, put it in the FAQ; then direct users to read the FAQ before sending questions or bug reports. A well-nurtured FAQ can decrease the support burden on the project maintainers by an order of magnitude or more.
Having a HISTORY or NEWS file with timestamps in it for each release is valuable. Among other things, it may help establish prior art if you are ever hit with a patent-infringement lawsuit (this hasn't happened to anyone yet, but best to be prepared).
Your software will change over time as you put out new releases. Some of these changes will not be backward-compatible. Accordingly, you should give serious thought to designing your installation layouts so that multiple installed versions of your code can coexist on the same system. This is especially important for libraries -- you can't count on all your client programs to upgrade in lockstep with your API changes.
The Emacs, Python, and Qt projects have a good convention for handling this; version-numbered directories. Here's how an installed Qt library hierarchy looks (${ver} is the version number):
/usr/lib/qt /usr/lib/qt-${ver} /usr/lib/qt-${ver}/bin # Where you find moc /usr/lib/qt-${ver}/lib # Where you find .so /usr/lib/qt-${ver}/include # Where you find header files
With this organization, you can have multiple versions coexisting. Client programs have to specify the library version they want, but that's a small price to pay for not having the interfaces break on them.
The de-facto standard format for installable binary packages is that used by the Red Hat Package manager, RPM. It's featured in the most popular Linux distribution, and supported by effectively all other Linux distributions (except Debian and Slackware; and Debian can install from RPMs).
Accordingly, it's a good idea for your project site to provide installable RPMs as well as source tarballs.
It's also a good idea for you to include in your source tarball the RPM spec file, with a production that makes RPMs from it in your Makefile. The spec file should have the extension `.spec'; that's how the rpm -t option finds it in a tarball.
For extra style points, generate your spec file with a shellscript that automatically plugs in the correct version number by analyzing the Makefile or a version.h.
Your software won't do the world much good if nobody but you knows it exists. Also, developing a visible presence for the project on the Internet will assist you in recruiting users and co-developers. Here are the standard ways to do that.
Announce new releases to comp.os.linux.announce. Besides being widely read itself, this group is a major feeder for web-based what's-new sites like Freshmeat.
Find USENET topics group directly relevant to your application, and announce there as well. Post only where the function of the code is relevant, and exercise restraint.
If (for example) you are releasing a program written in Perl that queries IMAP servers, you should certainly post to comp.mail.imap. But you should probably not post to comp.lang.perl unless the program is also an instructive example of cutting-edge Perl techniques.
Your announcement should include the URL of a project website.
If you intend try to build any substantial user or developer community around your project, it should have a website. Standard things to have on the website include:
Some project sites even have URLs for anonymous access to the master source tree.
It's standard practice to have a private development list through which project collaborators can communicate and exchange patches. You may also want to have an announcements list for people who want to be kept informed of the project's process
For the last several years, the Metalab archive has been the most important interchange location for Linux software.
Other important locations include:
Managing a project well when all the participants are volunteers presents some unique challenges. This is too large a topic to cover in a HOWTO. Fortunately, there are some useful white papers available that will help you understand the major issues.
For discussion of basic development organization and the release-early-release-often `bazaar mode', see The Cathedral and the Bazaar.
For discussion of motivational psychology, community customs, and conflict resolution, see Homesteading the Noosphere.
For discussion of economics and appropriate business models, see The Magic Cauldron.
These papers are not the last word on open-source development. But they were the first serious analyses to be written, and have yet to be superseded.