# How to monitor users
by Seth Kenlon

A long time ago in UNIX history, users on a server were actual UNIX users with entries in ``/etc/shadow`` and an interactive login shell and a home directory.
There were tools for admins to communicate with users, and to monitor their activity to avoid stupid or malicious mistakes that would cause server resources to be unfairly allocated.
These days, your userbase is less likely to have entries in ``/etc/shadow``, instead being managed by a layer of abstraction, whether it's LDAP or Drupal or OpenShift.
Then again, there are a lot more servers now, which means there are a lot more sys admins logging in and out to perform maintenance.
Where there's activity, there's opportunity for mistakes and confusion, so it's time to dust off those old monitoring tools and put them to good use.

Here are some of the montioring commands you may have forgotten about (or never knew about) to help you track what's been happening on your server.

## who

First, the basics.

The ``who`` command is provided by the GNU coreutils package, and its primary job is to parse the ``/var/log/utmp`` file and report its findings.

The ``utmp`` file logs the current users on the system.
It doesn't necessarily show every process, because not all programs initiate ``utmp`` logging.
In fact, your system may not even have a ``utmp`` file by default.
In that case, ``who`` falls back upon ``/var/log/wtmp``, which records all logins and logouts.

The ``wtmp`` file format is exactly the same as ``utmp``, except that a  null user name indicates a logout and the ``~`` character indicates a system shutdown or reboot.
The ``wtmp`` file is maintained by ``login(1)``, ``init(1)``, and some versions of ``getty(8)``, however none of these applications *creates* the file, so if you remove ``wtmp``, then record-keeping is deactivated.
That alone is good to know: if ``wtmp`` is missing, you should find out why!

The output of ``who --heading`` looks something like this:

```
NAME     LINE     TIME               COMMENT 
seth     tty2     2020-01-26 18:19   (tty2)
larry    pts/2    2020-01-28 13:02   (10.1.1.8)
curly    pts/3    2020-01-28 14:42   (10.1.1.5)
```

This shows you the username of each person logged in, the time their login was recorded, and their IP address.

The ``who`` command also humbly provides the official [POSIX](https://opensource.com/article/19/7/what-posix-richard-stallman-explains) way of discovering which user *you* are logged in as, but only if ``utmp`` exists:

```
$ who -m
curly   pts/3   2020-01-28 14:44 (10.1.1.8)
```

It also provides a mechanism to display the current runlevel:

```
$ who -r 
     run-level 5   2020-01-26 23:58
```

## w

For a little more context about users, the simple ``w`` command provides a list of who's logged in and what they're doing.
This information is displayed in a format similar to the output of ``who``, but the time the user has been idle, the CPU time used by all processes attached to the login TTY, and the CPU time used by just the current process.
The user's current process is listed in the final field.

Sample output:

```
$ w
 13:45:48 up 29 days, 19:24,  2 users,  load average: 0.53, 0.52, 0.54
USER     TTY     LOGIN@  IDLE    JCPU   PCPU WHAT
seth     tty2    Sun18   43:22m  0.01s  0.01s /usr/libexec/gnome-session-binary
curly    pts/2   13:02   35:12   0.03s  0.03s -bash
```

Alternatively, you can view the user's IP address with the ``-i`` or ``--ip-addr`` option.

You can narrow the output down to a single user name by specifying which user you want information about:

```
$ w seth
 13:45:48 up 29 days, 19:27,  2 users,  load average: 0.53, 0.52, 0.54
USER     TTY     LOGIN@  IDLE    JCPU   PCPU WHAT
seth     tty2    Sun18   43:25m  0.01s  0.01s /usr/libexec/gnome-session-binary
```

## utmpdump

The ``utmpdump`` utility does [almost] exactly what its name suggests: it dumps the contents of the ``/var/log/utmp`` file to your screen.
Actually, it dumps *either* the ``utmp`` or the ``wtmp`` file, depending on which you specify.
Of course the file you specify doesn't have to be located in ``/var/log`` or even named ``utmp`` or ``wtmp``, and it doesn't even have to be in the right format.
If you feed ``utmpdump`` a text file, it dumps the contents to your screen (or a file, with the ``--output`` option) in format that's predictable and easy to parse.

Normally, of course, you would just use ``who`` or ``w`` to parse login records, but ``utmpdump`` is useful in many instances. 

* Files can get corrupted. While ``who`` and ``w`` are often able to detect corruption themselves, ``utmpdump`` is ever more tolerant because it does no parsing on its own. It renders the raw data for you to deal with.
* Once you've repaired a corrupted file, ``utmpdump`` can patch your changes back in.
* Sometimes you just want to parse data yourself. Maybe you're looking for something that ``who`` and ``w`` aren't programmed to look for, or maybe you're trying to make correlations all your own.

Whatever the reason, ``utmpdump`` is a useful tool to extract raw data from the login records.

If you have repaired a corrupted login log, you can use ``utmpdump`` to write your changes back to the master log:

```
$ sudo utmpdump -r < wtmp.fix > /var/log/wtmp
```

## ps

Once you know who's logged in on your system, you can use ``ps`` to get a snapshot of current processes.
This isn't to be confused with the [top](https://www.redhat.com/sysadmin/customize-top-command), which displays a running report on current processes; this is a snapshot taken the moment ``ps`` is issued, and then printed to your screen.
There are advantages and disadvantages to both, so you can choose which to use based on your requirements.
Because of its static nature, ``ps`` is particularly useful for later analysis, or just as a nice manageable summary.

The ``ps`` command is old and well-known, and it seems many admins have learned the old UNIX command rather than the latest implementation.
The modern ``ps`` (from the ``procps-ng`` package) offers many helpful mnemonics, and it's what ships on RHEL, CentOS, Fedora, and many other distributions, so it's what this article uses. 

You can get all processes being run by a single user with the ``--user`` (or ``-u``) option, along with the user name of who you want a report on.
To give the output the added context of which process is the parent of a child process, use the ``--forest`` option for a "tree" view:

```
$ ps --forst --user larry
  PID TTY        TIME     CMD
  39707 ?        00:00:00 sshd
  39713 pts/4    00:00:00  \_ bash
  39684 ?        00:00:00 systemd
  39691 ?        00:00:00  \_ (sd-pam)
```

For every process on the system:

```
$ ps --forest -e
[...]
  29284 ?        00:00:48  \_ gnome-terminal-
  29423 pts/0    00:00:00  |   \_ bash
  42767 pts/0    00:00:00  |   |   \_ ps
  39631 pts/1    00:00:00  |   \_ bash
  39671 pts/1    00:00:00  |       \_ ssh
  32604 ?        00:00:00  \_ bwrap
  32612 ?        00:00:00  |   \_ bwrap
  32613 ?        00:09:05  |       \_ dring
  32609 ?        00:00:00  \_ bwrap
  32610 ?        00:00:15      \_ xdg-dbus-proxy
   1870 ?        00:00:05 gnome-keyring-d
   4809 ?        00:00:00  \_ ssh-agent
[...]
```

The default columns are useful, but you can change them to better suit what you're researching.
The ``-o`` option gives you full control over which columns you see. 
For a full list of possible columns, refer to the **Standard Format Specifiers** section of the **ps(1)** man page.

```
$ ps -eo pid,user,pcpu,args --sort user
   42799 root      0.0 [kworker/u16:7-flush-253:1]
  42829 root      0.0 [kworker/0:2-events]
  42985 root      0.0 [kworker/3:0-events_freezable_power_]
   1181 rtkit     0.0 /usr/libexec/rtkit-daemon
   1849 seth      0.0 /usr/lib/systemd/systemd --user
   1857 seth      0.0 (sd-pam)
   1870 seth      0.0 /usr/bin/gnome-keyring-daemon --daemonize --login
   1879 seth      0.0 /usr/libexec/gdm-wayland-session /usr/bin/gnome-session
```

The ``ps`` command is very flexible.
You can modify its output natively so you don't have to rely on ``grep`` and ``awk`` to find what you care about.
Craft a good ``ps`` command, alias it to something memorable, and run it often. 
It's one of the top ways to stay informed about what's happening on your server.

## pgrep

Sometimes, you may have some idea of a problematic process and need to investigate it instead of your users or system.
To do that, there's the ``pgrep`` command from the ``psproc-ng`` package.

At its most basic, ``pgrep`` works like a grep on the output of ``ps``:

```
$ pgrep bash
29423
39631
39713
```

Instead of listing the PIDs, you can just get a count of how many PIDs would be returned:

```
$ pgrep --count bash
3
```

For more information, you can affect your search through processes by user name (``-u``), terminal (``--terminal``), and age (``--newest`` and ``--oldest``), and more.
To find a process belonging to a specific user, for example:

```
$ pgrep bash -u moe --list-name
39631 bash
```

You can even get inverse matches with the ``--inverse`` option.

### pkill 

Related to ``pgrep`` is the ``pkill`` command.
It's a lot like the ``kill`` command, except that it uses the same options as ``pgrep`` so you can send signals to a troublesome process using whatever information is easiest for you.

For example, if you have discovered that a process initiated by user ``larry`` is monopolizing resources, and you know from ``w`` that ``larry`` is located on terminal ``pts/2``, then you can kill the login session and all of its children with just the terminal name:

```
$ sudo pkill -9 --terminal pts/2
```

Or you can use just the user name to end all processes matching it:

```
$ sudo pkill -u larry
```

Used judiciously, ``pkill`` is a good "panic" button or sledgehammer-style solution when a problem has gotten out of hand.

## Terminal monitoring

Just because a series of commands exist in a terminal doesn't mean they're necessarily better than other solutions.
Take stock of your requirements and choose the best tool for what you need.
Sometimes a graphical monitoring and reporting system is exactly what you need, and other times terminal commands that are easily scripted and parsed is the right answer.
Choose wisely, learn your tools, and you'll never be in the dark about what's happening within your bare metal.