IAFA-WG A Guide to FTP Site Administration (IAFA DOC I) DRAFT 92.06.22 Introduction ------------ As the growth of the Internet continues it has become fair to speak of an "Internet infostructure". Companion to the extensive physical infrastructure that is itself responsible for the specific routing and delivery of messages, the Internet infostructure is comprised of the growing body of information and the structure that supports it. Much of this infostructure is available equally to all users while other areas are available to anyone participating in specific network-based workgroups. The Internet acts as an enabling technology that makes available this wealth of information to those who know how to access it. In this document we concentrate on the remote file transfer model for sharing information in an Internet environment. On the net today this model is primarily implemented using the File Transfer Protocol (FTP) [1]. Available from most sites, an FTP service can provide a secure and reliable mechanism for copying specific files from one host to another across the network. In particular, we aim to provide information to anyone contemplating setting up or maintaining an Internet information archive using the facilities of FTP. A companion document provides specific recommendations on encoding various types of information to be offered by such a site. This document does not attempt to define specificially how the file system for an archive should be arranged since this depends upon the resources available, the information being distributed and the organizational structure of the groups administering the archive. We do offer general guidelines and some hints and recommendations. Site administrators setting up or running Anonymous FTP archives should also refer to RFC [* ???? *] "Publishing Information on the Internet with Anonymous FTP". This provides a detailed description of a standardized framework in which valuable additional information can be provided alongside the data that you distribute. Such additional information allows users and automated indexing tools to more easily search, locate and retrieve desired information from across the Internet. What is Anonymous FTP? ---------------------- The FTP service has been around since the early days of the Internet and it has been a successful service. According to statistics generated by the National Science Foundation (NSF), about 50% of current network traffic (by volume) on the NSFnet backbone in the United States is being used for this purpose [2]. The FTP system is designed around a client/server model. Users invoke an FTP client, which then connects to an FTP server process running on a remote host. The server is responsible for verifying the authenticity of the user and performing the operations requested by the user through the client, enforcing the security and integrity of the host system. In ordinary FTP sessions one would log into an account on the remote host from where one either wanted to retrieve the file or to which the file (or files) were to be placed. Commands allow the user to navigate through the remote file system and copy or delete files. This form of access requires one to have the name and password of the account on the remote system. Users of FTP normally must go through a login sequence when connecting to a foreign host. They are then allowed to copy those files to which they have been granted access permission. The login sequence provides basic authentication and security in an open systems environment with hundreds of thousands of interconnected hosts and millions of users. The underlying FTP and Internet communications protocols provide needed error checking and thus ensure the integrity of the transferred information. The basic FTP-based file sharing model has been extended through the creation of a network of universally accessible FTP archives sites. Information at such sites is available to all users of the Internet, without the usual authentication step using the convention of "anonymous FTP". Under this mechanism, site administrators make available a collection of files to the Internet community by creating a special "anonymous" user account. Such accounts either require no password, or accept a variety of strings as a password (for example, many sites allow the password to be any string that is formatted as a valid email address). This allows anyone to connect to such a site to copy information back to their own host. Anonymous FTP indexing tools ---------------------------- The use of anonymous FTP began as a convention among relatively few sites on the Internet and the names of sites supporting this mechanism was shared among users of the net through ad hoc methods. With the continued growth of the Internet [3] such methods are now seen to be inadequate. Recently, a number of information indexing and distribution services have been started to aid users in their search for information. It is expected that as the amount of information on the network grows, such services or information resource tools will become increasingly more important. The Internet Anonymous FTP Archive Working Group (IAFA-WG) has been formed under the auspices of the Internet Engineering Task Force to foster better utilization of the anonymous FTP archive mechanism for sharing information on the Internet. Anonymous FTP archives (AFA) are not the ideal method for publishing information on the Internet, however they do have the advantage of being relatively cheap and easy to establish and provide near universal access to their contents. With proper attention by archive site administrators, they provide a relatively simple way to distribute information. Organization of this document ----------------------------- This document is divided into 2 parts. Part I discusses the reasons why an organization might wish to establish an anonymous FTP archive site. Specific issues, both technical and non-technical are addressed to help you, the site administrator determine if establishing such an archive is appropriate for your site. Part II describes the steps needed to set up and maintain an anonymous FTP archive site. Specific examples for the most common operating system environments are included. Part I: What is an anonymous FTP archive ? Internet archives are repositories of information of common interest to a group. For example, researchers sharing a common set of data will often put the information in a central location so that it can be accessed by all those in the group. How this access is performed can vary, but on the Internet, the FTP service and the associated remote file sharing paradigm are often used. Why set up an anonymous FTP archive ? Site administrators set up anonymous FTP archives for any of several reasons: a) Sharing of useful information. Many sites contain data which their owners would like to make publicly available. Research papers, locally produced software and datasets are some of the most common offerings. An anonymous FTP archive allows you to make this information available to a large audience that would not otherwise be able to easily access it. b) Caching and redundancy. Sites at the end of a slow network link often set up such an archive to redistribute information obtained from other sites so that the operation need not be repeated multiple times for the same piece of information. Large software offerings such as X11 or TeX (which can total several hundred Megabytes) are often prime candidates for caching at the closer end of a slow network link. c) A site's profile can be greatly enhanced by providing a valuable network resource. A useful, large and well maintained archive site is such a resource to the general Internet community. This can give the group providing the archive higher visibility which in turn can call attention to other work at that site. d) A site with a large internal population of machines that are not themselves directly connected to the Internet (typically making use of a secure gateway) will often cache packages of interest to their internal population on a machine that is visible both internally as well as the rest of the Internet. This can often ease the fears of management about perceived security problems through unrestricted Internet connectivity while providing a useful service to the Internet as a whole. Initially, the majority of FTP archives resided on centrally controlled mainframes or minicomputers. The huge growth in the number of workstations and PCs on the Internet has led to the growth of a number of smaller, more site-specific archives. The current population of archives now offer everything from small collections of specialized data to offerings consisting of hundreds or even thousands of Megabytes of information, much of it shadowed or copied from other sites on the Internet. One must bear in mind that there are certain responsibilites that go along with the operation of an AFA. These include making sure that the resources are being used in a secure, ethical and legal manner. In addition, the system administration must allocate sufficient manpower resources to insure that these responsibilities are met. Part II: Setting up and maintaining an anonymous FTP archive site. Once it has been decided that an anonymous FTP account is to be created it is up to you, the system administrator to configure the FTP server to allow such access. Exactly how this is done is operating system dependent and may be as simple as creating a password entry with the appropriate information for an FTP pseudo-user account. In most systems today, support for the anonymous FTP account is built into the FTP server program (primarily to enforce security). It is important to bear in mind that once the account is enabled, by its very definition, _anyone_ on the network can access this account through FTP. You should note that the default anonymous FTP configuration (and corresponding documentation) supplied by your vendor may not always be secure or correct. We give examples below on how to set up such an account for some common operating systems. UNIX ---- In most implementations of UNIX, the FTP server, FTPd(8), is launched from inetd(8. The FTP server initially runs as root, changing to the UID of the specified user once the authentication step is complete. The anonymous facility is enabled by adding an account for the user "FTP" to the password file. A typical /etc/passwd entry would look like this: FTP:*:67:20:Anonymous FTP account:/home/FTP:/bin/false Note that a) the password entry for the account contains an asterisk ("*") and b) the shell is listed as /bin/false. This combination will prevent remote login access to the account through telnet(1) or by the BSD remote commands rlogin(1), rsh(1), rcp(1) etc. Most UNIX systems will have the FTP server perform a chroot(2) upon startup. This limits file access for the process to the directory subtree specified by the anonymous FTP home directory (in this example /home/FTP, specified as part of the passwd entry above). For security reasons, ~FTP and _all_ its subdirectories (eg. ~FTP/bin and ~FTP/etc) should be owned by an account other than FTP, preferably root. Each of these directories require read and execute permissions, but should limit write permission only for the owner of the directory (ie. chmod(1) them to 755). a) ~FTP/bin should contain a copy of the ls(1) program. It should be owned by root and the directory should have only execute permission set (ie. chmod(1) 0111). b) The directory ~FTP/etc should be created with owner root and file permissions set read only (0444). It may _optionally_ contain a passwd and group file (as specified in passwd(5) and group(5)). For security reasons these files should NOT be copied from /etc on the system in question. These files will only be used by FTPd to show the file owner and group of any files held in the anonymous FTP area. Any entries in the ~FTP/etc/passwd and ~FTP/etc/group files should have an asterisk ("*") in the password field. The home directory and login shell entries in the ~FTP/etc/passwd file should be omitted. For example the ~FTP/etc/passwd file could contain an entry of the form (note that the UID and GID in this entry are NOT the same as the example above): FTP-user:*:31:29:Anonymous FTP Account:: The ~FTP/etc/group file could contain: FTP-group:*:29: IMPORTANT --------- ~FTP/etc/passwd and ~FTP/etc/group files are optional and are only provided for the convenience of the anonymous FTP user. They show the apparent ownership and group of a file or directory in the anonymous FTP subtree when an ls(1) command is issued from within the FTP session. The true ownership and group of a file or directory is given by the /etc/passwd and /etc/group files. It is prudent to change the name, ownerid and groupid so that they are not the same as in the /etc/passwd and /etc/group entries. The contents of the system group and password files show not be made available to remote users at any time since it provides additional information which may be useful when attempting to compromise the security of a site. In systems with dynamic libraries (for example, systems running SunOS 4.X), a copy of these libraries and certain devices may also need to be added to the FTP subtree. Consult your documentation to determine if this is true in your case. [VMS] [other OS?] Technical and Administritative Notes -------------------------------------- There are a few areas where potential problems concerning either security or administrative might arise when running an anonymous FTP archive site. We will try to address some of these here. If you are not sure of the capabilities of the server software on your particular machine it is a good idea to consult your system documentation or your software vendor. Many of these problems can be solved by using one of the freely distributable FTP servers now available from various anonymous FTP archives. Technical Notes: ----------------- a) When ever you are running an FTP archive (whether it is an anonymous account or not) it is a good idea to use an FTP server that provides logging capabilities. This will allow you to keep track of the various operations that users are performing on your system. Of course, this implies that time to review the logs should also be allocated as part of the day-to-day operation of the system. It should be noted that this logged information usually contains the names or IP addresses of the hosts from which the client is logged on. In the past when there were many users on a system, this information didn't reveal much about who was doing what. However, in today's network environment where many individual computers have in fact become _personal_ computers, this information can easily identify the actual user to a high degree of probability. It is considered inappropriate behavior to release this logged information to individuals or groups not directly associated with the maintenance of the archive. Privacy rights have in many respects not been legally defined for computer environments. Thus it is up to each site administrator to see that privileged information is not consciously or inadvertently distributed. b) The view of the file system that the FTP client has access to should be restricted, with only those files specifically intended to being distributed actually visible. In the ideal case, this restriction should be enforced at the lowest possible level, preferably by the operating system itself. Application-level enforcement should be avoided. For example, some FTP servers try to restrict the movement of the clients by filtering pathname requests. This is a weaker enforcement of access policies than those supplied by the operating system and alternate servers which utilize OS support should be used where available. c) Many sites maintain "incoming" directories which allow the uploading of information into the archive by the general public. These can be very useful for the easy distribution of data, however they can also be used as a transfer point for files that should not be on your system. Most operating systems allow the creation of a directories that are world writable but not world readable. If you really want to have an incoming directory it is a good idea to configure it in this way to allow the site administrator to examine and approve submissions before they are moved to theur final location or made generally available. Even with write-only incoming directories problems can occur. Collaboration between remote users can mean that filenames have been agreed to beforehand and are thus accessible to the parties involved. It is suggested that quotas be imposed on the FTP user partition so that there be less likelyhood of any uploaded information causing the partition to become full. We recommend that such incoming directories only be used in situations where they are necessary and where the benefits of such a directory outweigh the potential problems outlined above. d) You should periodically check the permissions and ownerships of the files in your archive. Many administrators have adopted the practice of transferring ownership of archive files to the FTP pseudo-user. The file permissions should then be scrutinized to make sure that individual files cannot now be modified by that user (unless of course, that is specifically the intention). Remember, the "FTP user" is anyone using the anonymous account. The replacement of files with corrupted versions (viruses, trojan horses etc) has been known to occur. Ideally, no files in the subtree rooted at ~FTP should be owned by the FTP user (as defined in the /etc/passwd file). e) The anonymous FTP subtree of the file system should always be self contained. This means that references (for example, symbolic links) outside of this subtree cannot be resolved and are inaccessible to users of the system. In the specific case of UNIX systems, should file or directories should not be hard linked to any part of the file system other than the ~FTP subtree. f) Care should be taken when naming or renaming files in archives. The truism that names should be meaningful takes on a greater significance in this environment since this is often all that the remote user has to work with when trying to discover the contents of the file without actually retrieving it. If one is caching a file from another FTP site, renaming is usually not recommended since the ability to determine if the two files contain identical information can be lost. Some operating systems allow the use of whitespace and non printable characters in filenames but their use is strongly discouraged since this can make the file inaccessible to the remote user. Additionally, characters such as '@', '!', '|', or "_" may not be available or may have special significance on remote systems and should be used with caution. g) Very large files should be split into smaller pieces when placed in FTP archives. The retrieval of large files can be difficult on unreliable or congested links since if a failure during transfer occurs, it is usually not possible to restart from the point of failure and continue. The entire transfer has to be restarted. This can be time consuming and eat up network bandwidth. Currently, files of 500 - 600 kilobytes are usually considered as the maximum desirable size. Files larger than this should be split. h) As the site administrator you might want to consider creating a CNAME record in the Domain Name System for your Anonymous FTP Archive. This record usually takes the form "FTP.". This allows you to move the archive from one physical host to another without requiring you to notify all users of the move. For example, the machine "quiche.cs.mcgill.ca" would have a CNAME record which gave it the alternate name "FTP.cs.mcgill.ca". Thus if the archive for the domain cs.mcgill.ca is moved to another host, only the CNAME record would need to change. This change would in most cases be completely transparent to your users. i) You should ensure that you are running an up-to-date version of an FTP server, one which is not vulnerable to any known security problems. In particular, older operating system versions may have a vulnerable FTPd. If you identify your server as vulnerable, you should immediately discontinue service until you can install a fixed version. Administrative Notes: ---------------------- a) Check the contents of your archive regularly to make sure that the files stored there can legally (and ethically) be accessed by the general public. Only information that is freely distributable or in the public domain should be placed in an anonymous FTP archive site. Information of unknown status should not be made generally available until its status is resolved. Note: You should not assume that because the information was retrieved from another anonymous FTP archive that it is supposed to be generally available. When in doubt, contact the original source of the information (if known). There have been many instances in the past of proprietary information being unknowingly distributed by uninformed archive administrators. This could prove to be an expensive mistake. Know what is in your archive. b) It is wise to only obtain files for caching on your system from "reputable" sites around the net that are well known and are run in a professional manner. c) Anonymous FTP site administrators should be aware that the storage of pornographic material in their archives may cause problems of a legal or (more likely) political nature. This is also true of other potentially offensive material such as that related to explosives, terrorism, etc. There have been a number of cases where the network provider for sites carrying such material has threatened termination of network access until the offending files have been removed. Conclusion ---------- A well organized and maintained anonymous FTP archive can be a valuable asset to any organisation and to the Internet as a whole. With proper attention to the security of the system, you can provide a safe environment to help distribute the work of your own group or of any several million users on the network with minimal effort. References ---------- [1] RFC 959 Postel, J.B.; Reynolds, J.K. File Transfer Protocol. 1985 October; 69 p. (Obsoletes RFC 765 [IEN 149]) [2] NSFnet stats available from nis.nsf.net via anonymous FTP in the directory nsfnet/statistics/1992. [3] RFC 1296 Lottor, M. Internet Growth (1981-1991). 1992 January; -Butch