----> UNIX Articles : Multiplexed I/O

----> Author : Paulus Gandung Prakosa <-> syn_attack (syn1988@sdf.lonestar.org)

----> Thanks to : mywisdom (devilzc0de.org), ketek (devilzc0de.org), schumbag (devilzc0de.org), chaer.newbie (devilzc0de.org), kiddies (devilzc0de.org), gunslinger_ (devilzc0de.org), ditatompel (devilzc0de.org)

 

----- Articles Begin -----

Applications often need to block on more than one file descriptor, juggling I/O between keyboard input (stdin), interprocess communication (IPC), and a handful of files. Modern event-driven graphical user interface (GUI) applications may contend with literally hundreds of pending events via their mainloops.

Without the aid of threads -- essentially servicing each file descriptor separately -- a single process cannot reasonably block on more than file descriptor at the same time. Working with multiple file descriptors is fine, so long as they are always ready to be read from or written to. But as soon as one file descriptor that is not yet ready is encountered -- say, if a "read()" system call is issued, and there is not yet any data -- the process will block, no longer able to service the other file descriptors. It might block for just a few seconds, making the application inefficient and annoying the user. However, if no data becomes available on the file descriptor, it could block forever. Because file descriptors' I/O is often interrelated -- think pipes -- it quite possible for one file descriptor not to become ready until another is serviced. Particularly, with network applications, which may have many sockets open simultaneously, this is potentially quite a problem.

Imagine blocking on a file descriptor related to interprocess communication while "stdin" has data pending. The application won't know that keyboard input is pending until the blocked IPC file descriptor ultimately returns data -- But what is the blocked operations never returns?

Enter multiplexed I/O.

Multiplexed I/O allows an application to concurrently block on multiple file descriptors, and receive notification when any one of them becomes ready to read or write without blocking. Multiplexed I/O thus becomes the pivot point for the application, designed similarly to the following :

  1. Multiplexed I/O : Tell me when any of these file descriptors are ready for I/O.
  2. Sleep until one or more file descriptors are ready.
  3. Woken up: What is ready?
  4. Handle all file descriptors ready for I/O, without blocking.
  5. Go back to step 1, and start over.

Linux provides three multiplexed I/O solutions: the select, poll, and epoll interfaces.

select()

The select() system call provides a mechanism for implementing synchronous multiplexing I/O :

	#include <sys/time.h>
	#include <sys/types.h>
	#include <unistd.h>

	int select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

	FD_CLR(int fd, fd_set *set);
	FD_ISSET(int fd, fd_set *set);
	FD_SET(int fd, fd_set *set);
	FD_ZERO(fd_set *set);

The timeout parameter is a pointer to a timeval structure, which is defined as follows :

	#include <sys/time.h>

	struct timeval {
		long tv_sec;		/* seconds */
		long tv_usec;		/* microseconds */
	};

On success, select() returns the number of file descriptors ready for I/O, among all three sets. If a timeout was provided, the return value may be 0. On error, the call returns -1, and errno is set to one of the following values :

Because select() has historically been more readily implemented on various UNIX systems than a mechanism for subsecond-resolution sleeping, it is often employed as a portable way to sleep by providing a non-NULL timeout but NULL for all three sets :

	struct timeval tv;

	tv.tv_sec = 0;
	tv.tv_usec = 500;

	/* sleep for 500 microseconds */
	select(0, NULL, NULL, NULL, &tv);

pselect()

The select() system call, first introduced IN 4.2BSD, is popular, but POSIX defined it's own solution, pselect(), in POSIX 2003.1g-2000 and later in POSIX 1003.1-2001 :

	#define _XOPEN_SOURCE	600
	#include <sys/select.h>

	int pselect(int n,
		    fd_set *readfds,
                    fd_set *writefds,
                    fd_set *exceptfds,
                    const struct timespec *timeout,
                    const sigset_t *sigmask);

	FD_CLR(int fd, fd_set *set);
	FD_ISSET(int fd, fd_set *set);
	FD_SET(int fd, fd_set *set);
	FD_ZERO(fd_set *set);

There are three differences between pselect() and select() :

  1. pselect() uses the timespec structure, not the timeval structure, for it's timeout parameter. The timespec structure uses seconds and nanoseconds, not seconds and microseconds, providing theoretically superior timeout resolution. In practice, however, neither call reliably provides even microsecond resolution.
  2. A call to pselect() does not modify the timeout parameter. Conseqeuently, this parameter does not need to be reinitialized on subsequent invocations.
  3. The select() system call does not have the sigmask parameter. With respect to signals, when this parameter is set to NULL, pselect() behaves like select().

The timespec structure is defined as follows :

	#include <sys/time.h>

	struct timespec {
		long tv_sec;		/* seconds */
		long tv_nsec;		/* nanoseconds */
	};

poll()

The poll() system call is System V's multiplexed I/O solution. It solves several deficiencies in select(), although select() is still often used (again, most likely out of habit, or in the name of portability) :

	#include <sys/poll.h>

	int poll(struct pollfd *fds, unsigned int nfds, int timeout);

Unlike select(), with it's inefficient three bitmask-based sets of file descriptors, poll() employs a single array of nfds pollfd structures, pointed to by fds. The structure is defined as follows :

	#include <sys/poll.h>

	struct pollfd {
		int fd;		/* file descriptor */
		short events;	/* requested events to watch */
		short revents;	/* returned events witnessed */
	};

Each pollfd structure specifies a single file descriptor to watch. Multiple structures may be passed, instructing poll() to watch multiple file descriptors. The events field of each structure is a bitmask of events to watch for on that file descriptor. The user sets this field. The revents field is a bitmask of events were witnessed on the file descriptor. The kernel sets this field on return. All of the events requested in the events field may be returned in the revents field. Valid events are as follows :

In addition, the following events may be returned in the revents field :

On success, poll() returns the number of file descriptors whose structures have non-zero revents fields. It returns 0 if the timeout occured before any events occured. On failure, -1 is returned, and global variable errno is set to one of the following :

ppoll()

Linux provides a ppoll() cousin to poll(), in the same vein as pselect(). Unlike pselect(), however, ppoll() is a Linux-specific interface :

	#define _GNU_SOURCE
	#include <sys/poll.h>

	int ppoll(struct pollfd *fds,
		  nfds_t nfds,
                  const struct timespec *timeout,
                  const sigset_t *sigmask);

Differences between poll() and select() Linux system call

Although they perform the same basic job, the poll() system call is superior to select() for a handful of reasons :

The select() system call does have a few things going for it, though :

----- End Articles -----

 

References :