A file descriptor is an unsigned integer used by a process to identify an open file. Two thousand file descriptors are available to each process. The open, pipe, creat, and fcntl subroutines all generate file descriptors. File descriptors are generally unique to each process, but they can be shared by child processes created with a fork subroutine or copied by the fcntl, dup, and dup2 subroutines.
File descriptors are indexes to the file descriptor table in the u_block area maintained by the kernel for each process. The most common ways for processes to obtain file descriptors are through open or creat operations or through inheritance from a parent process. When a fork operation occurs, the descriptor table is copied for the child process, which allows the child process equal access to the files used by the parent process.
The system file and file descriptor data structures track each process' access to a file and ensure data integrity.
Structure | Activity and Contents |
---|---|
file descriptor table | Translates an index number (file descriptor) in the table to an open
file. File descriptor tables are created for each process and are located in
the u_block area set aside for that process. Each of the entries in a
file descriptor table has two fields: the flags area and the file pointer. The
structure of the file descriptor table is:
struct ufd { struct file *fp; int flags; }u_ufd[OPEN_MAX] The close-on-exec (FD_CLOEXEC bit) flag can be set in the file descriptor table using the fcntl subroutine. The dup subroutine copies one file descriptor entry into another position in the same table. The fork subroutine creates an identical copy of the entire file descriptor table for a child process. |
system open file table | Contains entries for each open file. Two of the most important pieces of
information tracked in a file table entry are the current offset referenced by
all read or write operations to the file and the open mode (O_RDONLY,
O_WRONLY, or O_RDWR) of the file.
The open file data structure contains the current I/O offset for the file. The system treats each read/write operation as an implied seek to the current offset. Thus if x bytes are read or written, the pointer advances x bytes. The lseek subroutine can be used to reassign the current offset to a specified location in files that are randomly accessible. Stream-type files (such as pipes and sockets) do not use the offset because the data in the file is not randomly accessible. The File Descriptor and System File Table Relationship figure illustrates the anatomy of and interaction between the process' file descriptor table and the system file table. File descriptors are recycled in ascending order. |
Because files can be shared by many users, it is necessary to allow related processes to share a common offset pointer and have a separate current offset pointer for independent processes that access the same file. The open file table entry maintains a reference count to track the number of file descriptors assigned to the file.
Multiple references to a single file can be caused by:
The File Sharing figure demonstrates what happens in the file descriptor and system file tables when two processes open the same file. Each open operation creates a system table entry. Individual table entries ensure each process a separate current I/O offsets. Independent offsets protect the integrity of the data.
When a file descriptor is duplicated, two processes then share the same offset and interleaving can occur. Interleaving means that bytes are not read or written sequentially.
There are three ways file descriptors can be duplicated between processes: the dup or dup2 subroutine, the fork subroutine, and the fcntl (file descriptor control) subroutine.
dup | Creates a copy of a file descriptor |
The duplicate is created at an empty space in the user file descriptor table that contains the original descriptor. A dup process increments the reference count in the file table entry by 1 and returns the index number of the file-descriptor where the copy was placed. See Duplicate File Descriptors Created by the dup Subroutine figure.
dup2 | Scans for the requested descriptor assignment and closes the requested file descriptor if it is open |
The dup2 subroutine allows the process to designate which descriptor entry the copy will occupy, if a specific descriptor-table entry is required.
fork | Creates a child process that inherits the file descriptors assigned to the parent process. See the Duplicate File Descriptors as a Result of a Fork Operation figure. The child process then execs a new process. Inherited descriptors that had the close-on-exec flag set by the fcntl subroutine close. |
When the shell runs a program, it opens three files with file descriptors 0, 1, and 2. The default assignments for these descriptors are:
0 | Represents standard input. |
1 | Represents standard output. |
2 | Represents standard error. |
These default file descriptors are connected to the terminal, so that if a program reads file descriptor 0 and writes file descriptors 1 and 2, the program collects input from the terminal and sends output to the terminal. As the program uses other files, file descriptors are assigned in ascending order.
If I/O is redirected using the < (less than) or > (greater than) symbols, the shell's default file descriptor assignments are changed. For instance:
prog < FileX > FileY
changes the default assignments for file descriptors 0 and 1 from the terminal to the appropriate files. In this example, file descriptor 0 now refers to FileX and file descriptor 1 refers to FileY. File descriptor 2 has not been changed. The program does not need to know where its input comes from nor where it is sent, as long as file descriptor 0 represents the input file and 1 and 2 represent output files.
The following sample program illustrates the redirection of standard output:
#include <fcntl.h> #include <stdio.h> void redirect_stdout(char *); main() { printf("Hello world\n"); /*this printf goes to * standard output*/ fflush(stdout); redirect_stdout("foo"); /*redirect standard output*/ printf("Hello to you too, foo\n"); /*printf goes to file foo */ fflush(stdout); }
void redirect_stdout(char *filename) { int fd; if ((fd = open(filename,O_CREAT|O_WRONLY,0666)) < 0) /*open a new file */ { perror(filename); exit(1); } close(1); /*close old */ *standard output*/ if (dup(fd) !=1) /*dup new fd to *standard input*/ { fprintf(stderr,"Unexpected dup failure\n"); exit(1); } close(fd); /*close original, new fd,*/ * no longer needed*/ }
The value for file descriptor 2 can also be reassigned, but this is rarely done.
Within the file descriptor table, file descriptor numbers are assigned the lowest descriptor number available at the time of a request for a descriptor. However, any value can be assigned within the file descriptor table by using the dup subroutine.
The number of file descriptors that can be allocated to a process is governed by a resource limit. The default value is set in the /etc/security/limits file and is typically 2000 (for compatibility with earlier releases). The limit can be changed by the ulimit command or the setrlimit subroutine. The maximum size is defined by the constant OPEN_MAX.
Files, Directories, and File Systems for Programmers
fcntl, dup, or dup2 subroutine, lseek subroutine, open, openx, or create subroutine