# How to monitor users by Seth Kenlon A long time ago in UNIX history, users on a server were actual UNIX users with entries in ``/etc/shadow`` and an interactive login shell and a home directory. There were tools for admins to communicate with users, and to monitor their activity to avoid stupid or malicious mistakes that would cause server resources to be unfairly allocated. These days, your userbase is less likely to have entries in ``/etc/shadow``, instead being managed by a layer of abstraction, whether it's LDAP or Drupal or OpenShift. Then again, there are a lot more servers now, which means there are a lot more sys admins logging in and out to perform maintenance. Where there's activity, there's opportunity for mistakes and confusion, so it's time to dust off those old monitoring tools and put them to good use. Here are some of the montioring commands you may have forgotten about (or never knew about) to help you track what's been happening on your server. ## who First, the basics. The ``who`` command is provided by the GNU coreutils package, and its primary job is to parse the ``/var/log/utmp`` file and report its findings. The ``utmp`` file logs the current users on the system. It doesn't necessarily show every process, because not all programs initiate ``utmp`` logging. In fact, your system may not even have a ``utmp`` file by default. In that case, ``who`` falls back upon ``/var/log/wtmp``, which records all logins and logouts. The ``wtmp`` file format is exactly the same as ``utmp``, except that a null user name indicates a logout and the ``~`` character indicates a system shutdown or reboot. The ``wtmp`` file is maintained by ``login(1)``, ``init(1)``, and some versions of ``getty(8)``, however none of these applications *creates* the file, so if you remove ``wtmp``, then record-keeping is deactivated. That alone is good to know: if ``wtmp`` is missing, you should find out why! The output of ``who --heading`` looks something like this: ``` NAME LINE TIME COMMENT seth tty2 2020-01-26 18:19 (tty2) larry pts/2 2020-01-28 13:02 (10.1.1.8) curly pts/3 2020-01-28 14:42 (10.1.1.5) ``` This shows you the username of each person logged in, the time their login was recorded, and their IP address. The ``who`` command also humbly provides the official [POSIX](https://opensource.com/article/19/7/what-posix-richard-stallman-explains) way of discovering which user *you* are logged in as, but only if ``utmp`` exists: ``` $ who -m curly pts/3 2020-01-28 14:44 (10.1.1.8) ``` It also provides a mechanism to display the current runlevel: ``` $ who -r run-level 5 2020-01-26 23:58 ``` ## w For a little more context about users, the simple ``w`` command provides a list of who's logged in and what they're doing. This information is displayed in a format similar to the output of ``who``, but the time the user has been idle, the CPU time used by all processes attached to the login TTY, and the CPU time used by just the current process. The user's current process is listed in the final field. Sample output: ``` $ w 13:45:48 up 29 days, 19:24, 2 users, load average: 0.53, 0.52, 0.54 USER TTY LOGIN@ IDLE JCPU PCPU WHAT seth tty2 Sun18 43:22m 0.01s 0.01s /usr/libexec/gnome-session-binary curly pts/2 13:02 35:12 0.03s 0.03s -bash ``` Alternatively, you can view the user's IP address with the ``-i`` or ``--ip-addr`` option. You can narrow the output down to a single user name by specifying which user you want information about: ``` $ w seth 13:45:48 up 29 days, 19:27, 2 users, load average: 0.53, 0.52, 0.54 USER TTY LOGIN@ IDLE JCPU PCPU WHAT seth tty2 Sun18 43:25m 0.01s 0.01s /usr/libexec/gnome-session-binary ``` ## utmpdump The ``utmpdump`` utility does [almost] exactly what its name suggests: it dumps the contents of the ``/var/log/utmp`` file to your screen. Actually, it dumps *either* the ``utmp`` or the ``wtmp`` file, depending on which you specify. Of course the file you specify doesn't have to be located in ``/var/log`` or even named ``utmp`` or ``wtmp``, and it doesn't even have to be in the right format. If you feed ``utmpdump`` a text file, it dumps the contents to your screen (or a file, with the ``--output`` option) in format that's predictable and easy to parse. Normally, of course, you would just use ``who`` or ``w`` to parse login records, but ``utmpdump`` is useful in many instances. * Files can get corrupted. While ``who`` and ``w`` are often able to detect corruption themselves, ``utmpdump`` is ever more tolerant because it does no parsing on its own. It renders the raw data for you to deal with. * Once you've repaired a corrupted file, ``utmpdump`` can patch your changes back in. * Sometimes you just want to parse data yourself. Maybe you're looking for something that ``who`` and ``w`` aren't programmed to look for, or maybe you're trying to make correlations all your own. Whatever the reason, ``utmpdump`` is a useful tool to extract raw data from the login records. If you have repaired a corrupted login log, you can use ``utmpdump`` to write your changes back to the master log: ``` $ sudo utmpdump -r < wtmp.fix > /var/log/wtmp ``` ## ps Once you know who's logged in on your system, you can use ``ps`` to get a snapshot of current processes. This isn't to be confused with the [top](https://www.redhat.com/sysadmin/customize-top-command), which displays a running report on current processes; this is a snapshot taken the moment ``ps`` is issued, and then printed to your screen. There are advantages and disadvantages to both, so you can choose which to use based on your requirements. Because of its static nature, ``ps`` is particularly useful for later analysis, or just as a nice manageable summary. The ``ps`` command is old and well-known, and it seems many admins have learned the old UNIX command rather than the latest implementation. The modern ``ps`` (from the ``procps-ng`` package) offers many helpful mnemonics, and it's what ships on RHEL, CentOS, Fedora, and many other distributions, so it's what this article uses. You can get all processes being run by a single user with the ``--user`` (or ``-u``) option, along with the user name of who you want a report on. To give the output the added context of which process is the parent of a child process, use the ``--forest`` option for a "tree" view: ``` $ ps --forst --user larry PID TTY TIME CMD 39707 ? 00:00:00 sshd 39713 pts/4 00:00:00 \_ bash 39684 ? 00:00:00 systemd 39691 ? 00:00:00 \_ (sd-pam) ``` For every process on the system: ``` $ ps --forest -e [...] 29284 ? 00:00:48 \_ gnome-terminal- 29423 pts/0 00:00:00 | \_ bash 42767 pts/0 00:00:00 | | \_ ps 39631 pts/1 00:00:00 | \_ bash 39671 pts/1 00:00:00 | \_ ssh 32604 ? 00:00:00 \_ bwrap 32612 ? 00:00:00 | \_ bwrap 32613 ? 00:09:05 | \_ dring 32609 ? 00:00:00 \_ bwrap 32610 ? 00:00:15 \_ xdg-dbus-proxy 1870 ? 00:00:05 gnome-keyring-d 4809 ? 00:00:00 \_ ssh-agent [...] ``` The default columns are useful, but you can change them to better suit what you're researching. The ``-o`` option gives you full control over which columns you see. For a full list of possible columns, refer to the **Standard Format Specifiers** section of the **ps(1)** man page. ``` $ ps -eo pid,user,pcpu,args --sort user 42799 root 0.0 [kworker/u16:7-flush-253:1] 42829 root 0.0 [kworker/0:2-events] 42985 root 0.0 [kworker/3:0-events_freezable_power_] 1181 rtkit 0.0 /usr/libexec/rtkit-daemon 1849 seth 0.0 /usr/lib/systemd/systemd --user 1857 seth 0.0 (sd-pam) 1870 seth 0.0 /usr/bin/gnome-keyring-daemon --daemonize --login 1879 seth 0.0 /usr/libexec/gdm-wayland-session /usr/bin/gnome-session ``` The ``ps`` command is very flexible. You can modify its output natively so you don't have to rely on ``grep`` and ``awk`` to find what you care about. Craft a good ``ps`` command, alias it to something memorable, and run it often. It's one of the top ways to stay informed about what's happening on your server. ## pgrep Sometimes, you may have some idea of a problematic process and need to investigate it instead of your users or system. To do that, there's the ``pgrep`` command from the ``psproc-ng`` package. At its most basic, ``pgrep`` works like a grep on the output of ``ps``: ``` $ pgrep bash 29423 39631 39713 ``` Instead of listing the PIDs, you can just get a count of how many PIDs would be returned: ``` $ pgrep --count bash 3 ``` For more information, you can affect your search through processes by user name (``-u``), terminal (``--terminal``), and age (``--newest`` and ``--oldest``), and more. To find a process belonging to a specific user, for example: ``` $ pgrep bash -u moe --list-name 39631 bash ``` You can even get inverse matches with the ``--inverse`` option. ### pkill Related to ``pgrep`` is the ``pkill`` command. It's a lot like the ``kill`` command, except that it uses the same options as ``pgrep`` so you can send signals to a troublesome process using whatever information is easiest for you. For example, if you have discovered that a process initiated by user ``larry`` is monopolizing resources, and you know from ``w`` that ``larry`` is located on terminal ``pts/2``, then you can kill the login session and all of its children with just the terminal name: ``` $ sudo pkill -9 --terminal pts/2 ``` Or you can use just the user name to end all processes matching it: ``` $ sudo pkill -u larry ``` Used judiciously, ``pkill`` is a good "panic" button or sledgehammer-style solution when a problem has gotten out of hand. ## Terminal monitoring Just because a series of commands exist in a terminal doesn't mean they're necessarily better than other solutions. Take stock of your requirements and choose the best tool for what you need. Sometimes a graphical monitoring and reporting system is exactly what you need, and other times terminal commands that are easily scripted and parsed is the right answer. Choose wisely, learn your tools, and you'll never be in the dark about what's happening within your bare metal.