Skip to main content Link Search Menu Expand Document (external link)

50.005 Computer System Engineering
Information Systems Technology and Design
Singapore University of Technology and Design
Natalie Agus (Summer 2024)

System Programs

In this part, you are tasked to create 4 more system programs: sys (simple system information), dspawn (daemon spawn), dcheck (daemon check), and backup (zip certain dir, and move the zipped dir to ./archive as backup).

sys

The sys system program prints out basic information about your operating system as follows:

You are free to print out the information in any format. It should include some information about the Operating System, kernel, total memory size, user currently logged-in, and CPU. You are free to add any additional information you deem fit. This system program is intended to mimic the functionality of neofetch, a command-line system information tool:

You should place sys source file inside [PROJECT_DIR]/source/system_programs/. The makefile will automatically detect this new script and compile it into [PROJECT_DIR]/bin

dspawn

The dspawn system program summons a daemon process and then terminates so that the shell may continue to display the next prompt. This is unlike other programs where the shell waits for it to finish before printing the next prompt.

Daemon Processes

Daemons are processes that are typically started when the system is bootstrapped and terminate only when the system is shut down. They don’t have a controlling terminal and they run in the background. In other words, a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user.

Your system should be running multiple daemon processes right now. As mentioned, these daemon processes are background services that start at boot time or when needed, running continuously to perform or manage system tasks and services without direct user interaction.

Examples of such daemons include:

  1. sshd, which handles Secure Shell (SSH) connections allowing secure remote access to the system;
  2. httpd or nginx, which serve web pages and manage HTTP traffic;
  3. crond, responsible for executing scheduled tasks;
  4. syslogd, which collects and manages system logs.

These daemons are essential for the regular operation, maintenance, and security of a Linux system, often starting with system initialization through scripts managed by the system’s init system (such as systemd, SysVinit, or Upstart). They typically run with elevated privileges (e.g: root privileges) to perform tasks that regular users cannot, ensuring smooth and secure operation of both servers and desktops.

To list daemon processes on a Linux system using the ps command, you can look for processes that do not have a controlling terminal (tty or pts). Traditionally, the process names of a daemon end with the letter d, for clarification that the process is in fact a daemon, and for differentiation between a daemon and a normal computer program. You can use the command ps axo tty,pid,ppid,comm | grep '[d]$ | sort -k 4 to try listing out some daemons in your system:

For the sake of our lab and our machine’s health, our daemon terminates after a certain period of time and violates the traditional daemon definition, but we’re sure you get the idea.

Overgoogling

The basic information about daemons presented in this handout is sufficient. Do not over-Google about Daemons unless you are really interested in it. The concept of daemons alone is very complex and large, and is out of our scope.

Characteristics of a Daemon Process

A daemon process is still a normal process, running in user mode with certain characteristics which distinguish it from a normal process.

The characteristics of a daemon process are listed below.

No controlling terminal

By definition, a daemon process does not require direct user interaction and therefore must detach itself from any controlling terminal.

In the ps -ef output, if the TTY or TT column is listed as a ? meaning it does not have a controlling terminal.

You might have heard about pts as well. PTS (Pseudo Terminal Slave) and TTY (Teletype) are both types of terminal interfaces in Unix-like operating systems, but they serve different purposes and operate in slightly different ways.

You can inspect which process uses pts and which uses tty in your current shell session with the following command ps a -o pid,ppid,tty,comm:

PPID is Typically 1 or 2

The PPID of a daemon process is either 1 or 2, meaning that whoever was creating the daemon process must terminate to let the daemon process be adopted by the init process (or equivalent).

Although it is common, not all systems assign the initd process to adopt orphaned process. More modern linux distros uses systemd, kthreadd, or launchd (or other designated descendant processes or equivalent) to adopt all orphaned process. The pid of initd or systemd or equivalent processes is also not always 1. As long as your daemon process’ ppid is the same as the pid of init or pid of other special instances of init or pid of systemd and equivalent, it is acceptable as long as it is consistent (that is the same designated system process is always adopting your daemon processes). When we test it in our system, we know exactly what process will adopt your daemon process, so you don’t need to worry.

To check the details of the processes with PID 1 and PID 2 on a Linux system, you can use the following command ps -f -p 1,2 (note that your output might differ):

Working directory: root

The working directory of the daemon process is typically the root (/) directory.

Closes all uneeded file descriptors

It closes all unneeded file descriptors. In UNIX and related computer operating systems, a file descriptor (FD, less frequently fildes) is an abstract indicator (handle) used to access a file or other input/output resource, such as a pipe or network socket.

Also, it closes and redirect fd 0, 1, and 2 to /dev/null.

Logging

It logs important messages through a central logging facilities, such as the BSD syslog.

For instance, Ubuntu uses the systemd system and service manager, which includes systemd-journald, a service that collects and manages journal entries from all parts of the system. However, rsyslog is still present and integrates with systemd-journald to provide traditional log file management.

You can view the system log using the command journalctl -o short-precise in Linux distros:

In macOS, you can use the following command instead log show --last 1s:

For our toy daemon, we should log to a designated log file, such as [PROJECT_DIR]/dspawn.log.

Privileges

Some daemons are running with elevated privileges depending on its task. We are not going to do this for security reasons. We let our daemon run with regular user privileges.

Implementation Notes

Incantation

The general steps to summon a daemon process inside dspawn main function is as follows:

  1. fork() from the parent process (dspawn)
  2. Close parent with exit(1)
  3. On child process (this is intermediate process), call setsid() so that the child becomes session leader to lose the controlling TTY
  4. Ignore SIGCHLD, SIGHUP
  5. fork() again, then parent (the intermediate) process terminates
  6. Child process (the daemon) set new file permissions using umask(0). Daemon’s PPID at this point should be 1 or 2 (the initd or equivalent)
  7. Change working directory to root
  8. Close all open file descriptors using sysconf(_SC_OPEN_MAX) and redirect fd 0,1,2 to /dev/null
  9. Execute daemon_work(), this function supposedly never returns, but we will modify it to terminate after some time

Step 1: first fork

The fork() splits this process into two: the parent (process group leader) and the child process (that we will call intermediate process for the next few sections).

The reason for this fork() is so that dspawn returns immediately to cseshell and our shell does not wait for the daemon to exit. Daemons are background processes that do not exit until the system is shut down. We don’t want our shell to wait forever.

Step 2: exit dspawn

At this point, the shell will return while the intermediate process proceed to spawn the daemon.

Step 3: setsid

The child (intermediate process) process is by default not a process group leader. It calls setsid() to be the session leader and loses controlling TTY (terminal).

setsid() is only effective when called by a process that is not a process group leader. The fork() in step 1 ensures this. The system call setsid() is used to create a new session containing a single (new) process group, with the current process as both the session leader and the process group leader of that single process group. You can read more about setsid() here.

Group and Session leader

Compile and try this code:

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>


int main(){
   pid_t pid = fork();
   if (pid == 0){
       printf("Child process with pid %d, pgid %d, session id: %d\n", getpid(), getpgid(getpid()), getsid(getpid()));
       setsid(); // child tries setsid
       printf("Child process has setsid with pid %d, pgid %d, session id: %d\n", getpid(), getpgid(getpid()), getsid(getpid()));

   }
   else{
     printf("Parent process with pid %d, pgid %d, session id :%d\n", getpid(), getpgid(getpid()), getsid(getpid()));
       setsid(); // parent tries setsid
       printf("Parent process has setsid with pid %d, pgid %d, session id: %d\n", getpid(), getpgid(getpid()), getsid(getpid()));
       wait(NULL);

   }
   return 0;
}

It results in such output:

Let’s analyse them line by line.

The first line: The parent process has pid == pgid, that is 75614.

  • This tells us that this process ‘iddemo’ is the process group leader, but not a session leader since the session id 27063 is not equal to the pid 75614.

The third line: When process 75614 forks, it has a child process with pid 75615. It is clear that since child pid != pgid, then the child process is not a session leader and is not a group leader either.

So who is 27063? We can type the command ps -a -j and find a process with pid 27063. Apparently, it’s the zsh, the shell itself, connected to the controlling terminal s002.

The third and fourth lines: When both the child and parent process attempt to call setsid,

  • In the child process, setsid effectively makes the pgid and session id to be equal to its pid, 75615.
  • In the parent process, setsid has no effect on the session id, since the manual states that setsid only sets the process to be the session and process group leader if it is called by a process that is not a process group leader.

Thanks to Step 1, the effect of setsid in Step 3 works as intended, and our intermediate process now lose the controlling terminal (part of a requirement to be a daemon process).

Step 4: Ignore SIGCHLD and SIGHUP

This intermediate process is going to fork() one more time in Step 5 to create the daemon process.

By ignoring SIGCHLD, our daemon process – the child of this intermediate process will NOT be a zombie process when it terminates. Normally, a child process will be a zombie process if the parent does not wait for it.

  • Since SIGCHLD is ignored, when the daemon (child of this intermediate process) exits, it is reaped immediately.
  • However, the daemon will outlive the parent process anyway, so it does not really matter. This step is just for “in case”.

Also, this intermediate process is a session leader (from step 3, since we need to lose the controlling terminal).

  • If we terminate a session leader, a SIGHUP signal will be received and the children of the session leader will be killed.
  • We do not want our daemon (child of this process) to be killed, therefore we need to call signal(SIGHUP, SIG_IGN) first before forking in Step 5.

Step 5: second fork

The child of this intermediate process is the daemon process.

The second fork, is useful for allowing the parent process (intermediate process) to terminate. This ensures that the child process is not a session leader.

Think!

Why must you ensure that the daemon is not a session leader? Since a daemon has no controlling terminal, if a daemon is a session leader, an act of opening a terminal device will make that device the controlling terminal.

We do not want this to happen with your daemon, so this second fork() handles this issue. As mentioned above, before forking it is necessary to ignore SIGHUP. This prevents the child from being killed when the parent (which is the session leader) dies.

Step 6: umask(0)

The daemon process must set all new files created by it to have 0777 permission using umask(0). This is that the file it has created can be globally readable, writeable, and executable by any other processes.

Setting the umask to 0 means that newly created files or directories created will have all permissions set, so any file created by this daemon can be accessed by any other processes because we can’t directly control the daemon anymore.

A umask of zero will cause all files to be created as permission 0777 or world-RW & executable. Some system sets the permission as 0666 by default instead of 0777 for security reasons, and we don’t want this!

How does setting umask(0) lands you with 0777 file permission?

0777 actually stands for octal 777. In C, the first 0 indicates octal notation, and you can translate the rest in binary: 111 111 111, which means we will have - rwx rwx rwx, equivalent to global RW and executable for the file. If we want to restrict permission of write to only the owner, we can set umask(022) – equivalent to having permission 0755, with the binary: 111 101 101, which translates to - rwx r-x r-x.

The manual for umask can be found here.

Step 7: chdir to root directory

Change the current working directory to root using chdir("/"). If a daemon were to leave its current working directory unchanged then this would prevent the filesystem containing that directory from being unmounted while the daemon was running. It is therefore good practice for daemons to change their working directory to a safe location that will never be umounted, like root.

Step 8: handle fd 0, 1, 2 and close all unused fds

Close all open file descriptors and redirect stdin, stdout, and stderr (fd 0, 1, and 2 by default in UNIX systems) to /dev/null so that it won’t reacquire them again if you mistakenly attempt to output to stdout or read from stdin.

Once it is running a daemon should NOT read from or write to the terminal from which it was launched. The simplest and most effective way to ensure this is to close the file descriptors corresponding to stdin, stdout and stderr. These should then be reopened, either to /dev/null, or if preferred to some other location.

There are two reasons for not leaving them closed:

  1. To prevent code that refers to these file descriptors from failing
  2. To prevent the descriptors from being reused when we call open() from the daemon’s code.

To close all opened file descriptors, you need to loop through existing file descriptors, and re-attach the first 3 fd’s using dup(0). Note that open() and dup() will assign the smallest available file descriptor, in this case that is 0, 1, and 2 in sequence. You will learn more about these stuffs in the last OS chapter.

   /* Close all open file descriptors */
   int x;
   for (x = sysconf(_SC_OPEN_MAX); x>=0; x--)
   {
       close (x);
   }

   /*
   * Attach file descriptors 0, 1, and 2 to /dev/null. */
   fd0 = open("/dev/null", O_RDWR);
   fd1 = dup(0);
   fd2 = dup(0);

Step 9: execute daemon_work()

You are free to implement daemon_work() however way you like. Here’s one example:

char output_file_path[PATH_MAX];

static int daemon_work()
{
    // put your full PROJECT_DIR path here  
    strcpy(output_file_path, "[PROJECT_DIR]/dspawn.log"); 

    int num = 0;
    FILE *fptr;
    char *cwd;
    char buffer[1024];

    // write PID of daemon in the beginning
    fptr = fopen(output_file_path, "a");
    if (fptr == NULL)
    {
        return EXIT_FAILURE;
    }

    fprintf(fptr, "Daemon process running with PID: %d, PPID: %d, opening logfile with FD %d\n", getpid(), getppid(), fileno(fptr));

    // then write cwd
    cwd = getcwd(buffer, sizeof(buffer));
    if (cwd == NULL)
    {
        perror("getcwd() error");
        return 1;
    }

    fprintf(fptr, "Current working directory: %s\n", cwd);
    fclose(fptr);

    while (1)
    {

        // use appropriate location if you are using MacOS or Linux
        fptr = fopen(output_file_path, "a");

        if (fptr == NULL)
        {
            return EXIT_FAILURE;
        }

        fprintf(fptr, "PID %d Daemon writing line %d to the file.  \n", getpid(), num);
        num++;

        fclose(fptr);

        sleep(10);

        if (num == 10) // we just let this process terminate after 10 counts
            break;
    }

    return EXIT_SUCCESS;
}

Alternatively, you can store the current working directory of dspawn in its main function first, before the first fork() so that you don’t have to hardcode the [PROJECT_DIR] in daemon_work().

// in main() of dspawn.c 

    // Setup path
    if (getcwd(output_file_path, sizeof(output_file_path)) == NULL)
    {
        perror("getcwd() error, exiting now.");
        return 1;
    }
    strcat(output_file_path, "/dspawn.log"); 

Note that you won’t be able to simply do fptr = fopen("./dspawn.log", "a") in daemon_work because at this point, the daemon’s current working directory is root (/). Since we did not elevate the privilege of the daemon process, we won’t be able to create any new file at root directory.

Observed Output

You should be able to spawn a daemon process using dspawn command. The shell should almost immediately return with a new prompt:

You should be able to run multiple dspawn command to spawn multiple daemon processes. All daemons should write the “log” to [PROJECT_DIR]/dspawn.log.

A log file called dspawn.log should appear at [PROJECT_DIR] with the following content:

You can see clearly that the daemon process is adopted by the init (or equivalent) process with PID of 1, and that it’s current working directory is root /. The file dspawn.log is opened using fd 3. To check whether fd 0,1,2, are all pointed to /dev/null correctly, you can use the lsof -p [DAEMON_PID] command:

In Linux distros, everything is a file. Hence, you can read any process’ file descriptor directly using the command: ls -l /proc/[PID]/fd. lsof works on both Linux-based OS and macOS.

If you spawn multiple daemon processes at the same time, the log file should show an interleaved printout of messages from each daemon process:

If you run dspawn using WSL, your PPID may not be 1. Please find proper Linux distro (be it using someone’s VM or dual boot) to confirm that your daemon process’ PPID is either 1, 2, or other dedicated processes that “adopts” orphaned process. Run the command pstree -p to help you understand the process hierarchy better.

dcheck

This system program simply checks how many live daemons (spawned from dspawn) are alive right now:

Implementation Notes

The easiest way to find out how many daemon processes spawned from dspawn are running right now is by using the ps command, and filtering its output with the name dspawn that doesn’t have an output terminal:

The daemon process’ name inherits dspawn command name, hence they both have the same name.

ps -efj | grep dspawn  | grep -Ev 'tty|pts' 

From the screenshot example above, we have two daemon processes that are alive. This corresponsds to ps output, with two lines (entries) reported. You can execute any command from a C program using the system call API system.

hint

You might need to use some kind of output redirection to read and analyse the output of ps executed from your C script with system.

backup

The backup system program must be able to automatically zip a directory whose name matches environment variable BACKUP_DIR, and move this zipped directory to [PROJECT_DIR]/archive/. BACKUP_DIR must be set up by the shell before calling backup. The filename of the zipped file must include the correct datetime indicating when the backup command was executed.

Here’s an example of running backup before and after BACKUP_DIR environment variable is set:

You are free to decorate the output of backup command, as long as the bare minimum information as shown above is included.

It should also be able to zip a single file if BACKUP_DIR points to a file instead of a directory:

Summary

By the end of this task, your shell will have access to these 4 new system program at [PROJECT_DIR]/bin.