sys
dspawn
dcheck
backup
- Summary
50.005 Computer System Engineering
Information Systems Technology and Design
Singapore University of Technology and Design
Natalie Agus (Summer 2024)
System Programs
In this part, you are tasked to create 4 more system programs: sys (simple system information), dspawn (daemon spawn), dcheck (daemon check), and backup (zip certain dir, and move the zipped dir to ./archive
as backup).
sys
The sys
system program prints out basic information about your operating system as follows:
You are free to print out the information in any format. It should include some information about the Operating System, kernel, total memory size, user currently logged-in, and CPU. You are free to add any additional information you deem fit. This system program is intended to mimic the functionality of neofetch
, a command-line system information tool:
You should place sys
source file inside [PROJECT_DIR]/source/system_programs/
. The makefile
will automatically detect this new script and compile it into [PROJECT_DIR]/bin
dspawn
The dspawn
system program summons a daemon process and then terminates so that the shell may continue to display the next prompt. This is unlike other programs where the shell waits for it to finish before printing the next prompt.
Daemon Processes
Daemons are processes that are typically started when the system is bootstrapped and terminate only when the system is shut down. They don’t have a controlling terminal and they run in the background. In other words, a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user.
Your system should be running multiple daemon processes right now. As mentioned, these daemon processes are background services that start at boot time or when needed, running continuously to perform or manage system tasks and services without direct user interaction.
Examples of such daemons include:
sshd
, which handles Secure Shell (SSH) connections allowing secure remote access to the system;httpd
ornginx
, which serve web pages and manage HTTP traffic;crond
, responsible for executing scheduled tasks;syslogd
, which collects and manages system logs.
These daemons are essential for the regular operation, maintenance, and security of a Linux system, often starting with system initialization through scripts managed by the system’s init system (such as systemd, SysVinit, or Upstart). They typically run with elevated privileges (e.g: root privileges) to perform tasks that regular users cannot, ensuring smooth and secure operation of both servers and desktops.
To list daemon processes on a Linux system using the ps command, you can look for processes that do not have a controlling terminal (tty
or pts
). Traditionally, the process names of a daemon end with the letter d
, for clarification that the process is in fact a daemon, and for differentiation between a daemon and a normal computer program. You can use the command ps axo tty,pid,ppid,comm | grep '[d]$ | sort -k 4
to try listing out some daemons in your system:
For the sake of our lab and our machine’s health, our daemon terminates after a certain period of time and violates the traditional daemon definition, but we’re sure you get the idea.
Overgoogling
The basic information about daemons presented in this handout is sufficient. Do not over-Google about Daemons unless you are really interested in it. The concept of daemons alone is very complex and large, and is out of our scope.
Characteristics of a Daemon Process
A daemon process is still a normal process, running in user mode with certain characteristics which distinguish it from a normal process.
The characteristics of a daemon process are listed below.
No controlling terminal
By definition, a daemon process does not require direct user interaction and therefore must detach itself from any controlling terminal.
In the ps -ef
output, if the TTY
or TT
column is listed as a ?
meaning it does not have a controlling terminal.
You might have heard about pts
as well. PTS (Pseudo Terminal Slave) and TTY (Teletype) are both types of terminal interfaces in Unix-like operating systems, but they serve different purposes and operate in slightly different ways.
You can inspect which process uses pts
and which uses tty
in your current shell session with the following command ps a -o pid,ppid,tty,comm
:
PPID is Typically 1 or 2
The PPID of a daemon process is either 1 or 2, meaning that whoever was creating the daemon process must terminate to let the daemon process be adopted by the init
process (or equivalent).
Although it is common, not all systems assign the initd
process to adopt orphaned process. More modern linux distros uses systemd
, kthreadd
, or launchd
(or other designated descendant processes or equivalent) to adopt all orphaned process. The pid of initd
or systemd
or equivalent processes is also not always 1. As long as your daemon process’ ppid is the same as the pid of init
or pid of other special instances of init
or pid of systemd
and equivalent, it is acceptable as long as it is consistent (that is the same designated system process is always adopting your daemon processes). When we test it in our system, we know exactly what process will adopt your daemon process, so you don’t need to worry.
To check the details of the processes with PID 1 and PID 2 on a Linux system, you can use the following command ps -f -p 1,2
(note that your output might differ):
Working directory: root
The working directory of the daemon process is typically the root
(/) directory.
Closes all uneeded file descriptors
It closes all unneeded file descriptors. In UNIX and related computer operating systems, a file descriptor (FD, less frequently fildes) is an abstract indicator (handle) used to access a file or other input/output resource, such as a pipe or network socket.
Also, it closes and redirect fd 0, 1, and 2 to /dev/null
.
Logging
It logs important messages through a central logging facilities, such as the BSD syslog
.
For instance, Ubuntu uses the systemd system and service manager, which includes systemd-journald, a service that collects and manages journal entries from all parts of the system. However, rsyslog is still present and integrates with systemd-journald to provide traditional log file management.
You can view the system log using the command journalctl -o short-precise
in Linux distros:
In macOS, you can use the following command instead log show --last 1s
:
For our toy daemon, we should log to a designated log file, such as [PROJECT_DIR]/dspawn.log
.
Privileges
Some daemons are running with elevated privileges depending on its task. We are not going to do this for security reasons. We let our daemon run with regular user privileges.
Implementation Notes
Incantation
The general steps to summon a daemon process inside dspawn
main function is as follows:
fork()
from the parent process (dspawn
)- Close parent with
exit(1)
- On child process (this is intermediate process), call
setsid()
so that the child becomes session leader to lose the controlling TTY - Ignore
SIGCHLD
,SIGHUP
fork()
again, then parent (the intermediate) process terminates- Child process (the daemon) set new file permissions using
umask(0)
. Daemon’s PPID at this point should be 1 or 2 (the initd or equivalent) - Change working directory to
root
- Close all open file descriptors using
sysconf(_SC_OPEN_MAX)
and redirect fd0
,1
,2
to/dev/null
- Execute
daemon_work()
, this function supposedly never returns, but we will modify it to terminate after some time
Step 1: first fork
The fork()
splits this process into two: the parent (process group leader) and the child process (that we will call intermediate process for the next few sections).
The reason for this fork()
is so that dspawn
returns immediately to cseshell
and our shell does not wait for the daemon to exit. Daemons are background processes that do not exit until the system is shut down. We don’t want our shell to wait forever.
Step 2: exit dspawn
At this point, the shell will return while the intermediate process proceed to spawn the daemon.
Step 3: setsid
The child (intermediate process) process is by default not a process group leader. It calls setsid()
to be the session leader and loses controlling TTY (terminal).
setsid()
is only effective when called by a process that is not a process group leader. The fork()
in step 1 ensures this. The system call setsid()
is used to create a new session containing a single (new) process group, with the current process as both the session leader and the process group leader of that single process group. You can read more about setsid() here.
Group and Session leader
Compile and try this code:
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
int main(){
pid_t pid = fork();
if (pid == 0){
printf("Child process with pid %d, pgid %d, session id: %d\n", getpid(), getpgid(getpid()), getsid(getpid()));
setsid(); // child tries setsid
printf("Child process has setsid with pid %d, pgid %d, session id: %d\n", getpid(), getpgid(getpid()), getsid(getpid()));
}
else{
printf("Parent process with pid %d, pgid %d, session id :%d\n", getpid(), getpgid(getpid()), getsid(getpid()));
setsid(); // parent tries setsid
printf("Parent process has setsid with pid %d, pgid %d, session id: %d\n", getpid(), getpgid(getpid()), getsid(getpid()));
wait(NULL);
}
return 0;
}
It results in such output:
Let’s analyse them line by line.
The first line: The parent process has pid == pgid, that is 75614
.
- This tells us that this process ‘iddemo’ is the process group leader, but not a session leader since the session id
27063
is not equal to the pid75614
.
The third line: When process 75614
forks, it has a child process with pid 75615
. It is clear that since child pid != pgid
, then the child process is not a session leader and is not a group leader either.
So who is 27063? We can type the command ps -a -j and find a process with pid 27063. Apparently, it’s the zsh
, the shell itself, connected to the controlling terminal s002
.
The third and fourth lines: When both the child and parent process attempt to call setsid
,
- In the child process,
setsid
effectively makes thepgid
and sessionid
to be equal to its pid,75615
. - In the parent process, setsid has no effect on the session id, since the manual states that setsid only sets the process to be the session and process group leader if it is called by a process that is not a process group leader.
Thanks to Step 1, the effect of setsid
in Step 3 works as intended, and our intermediate process now lose the controlling terminal (part of a requirement to be a daemon process).
Step 4: Ignore SIGCHLD
and SIGHUP
This intermediate process is going to fork()
one more time in Step 5 to create the daemon process.
By ignoring SIGCHLD
, our daemon process – the child of this intermediate process will NOT be a zombie process when it terminates. Normally, a child process will be a zombie process if the parent does not wait
for it.
- Since
SIGCHLD
is ignored, when the daemon (child of this intermediate process) exits, it is reaped immediately. - However, the daemon will outlive the parent process anyway, so it does not really matter. This step is just for “in case”.
Also, this intermediate process is a session leader (from step 3, since we need to lose the controlling terminal).
- If we terminate a session leader, a
SIGHUP
signal will be received and the children of the session leader will be killed. - We do not want our daemon (child of this process) to be killed, therefore we need to call
signal(SIGHUP, SIG_IGN)
first before forking in Step 5.
Step 5: second fork
The child of this intermediate process is the daemon process.
The second fork, is useful for allowing the parent process (intermediate process) to terminate. This ensures that the child process is not a session leader.
Think!
Why must you ensure that the daemon is not a session leader? Since a daemon has no controlling terminal, if a daemon is a session leader, an act of opening a terminal device will make that device the controlling terminal.
We do not want this to happen with your daemon, so this second fork()
handles this issue. As mentioned above, before forking it is necessary to ignore SIGHUP
. This prevents the child from being killed when the parent (which is the session leader) dies.
Step 6: umask(0)
The daemon process must set all new files created by it to have 0777
permission using umask(0). This is that the file it has created can be globally readable, writeable, and executable by any other processes.
Setting the umask
to 0
means that newly created files or directories created will have all permissions set, so any file created by this daemon can be accessed by any other processes because we can’t directly control the daemon anymore.
A umask of zero will cause all files to be created as permission 0777
or world-RW & executable. Some system sets the permission as 0666
by default instead of 0777
for security reasons, and we don’t want this!
How does setting umask(0)
lands you with 0777
file permission?
0777
actually stands for octal 777
. In C, the first 0 indicates octal notation, and you can translate the rest in binary: 111 111 111
, which means we will have - rwx rwx rwx
, equivalent to global RW and executable for the file. If we want to restrict permission of write to only the owner, we can set umask(022)
– equivalent to having permission 0755
, with the binary: 111 101 101, which translates to - rwx r-x r-x
.
The manual for umask
can be found here.
Step 7: chdir
to root directory
Change the current working directory to root using chdir("/")
. If a daemon were to leave its current working directory unchanged then this would prevent the filesystem containing that directory from being unmounted while the daemon was running. It is therefore good practice for daemons to change their working directory to a safe location that will never be umounted, like root.
Step 8: handle fd 0, 1, 2 and close all unused fds
Close all open file descriptors and redirect stdin
, stdout
, and stderr
(fd 0, 1, and 2 by default in UNIX systems) to /dev/null
so that it won’t reacquire them again if you mistakenly attempt to output to stdout
or read from stdin
.
Once it is running a daemon should NOT read from or write to the terminal from which it was launched. The simplest and most effective way to ensure this is to close the file descriptors corresponding to stdin
, stdout
and stderr
. These should then be reopened, either to /dev/null
, or if preferred to some other location.
There are two reasons for not leaving them closed:
- To prevent code that refers to these file descriptors from failing
- To prevent the descriptors from being reused when we call open() from the daemon’s code.
To close all opened file descriptors, you need to loop through existing file descriptors, and re-attach the first 3 fd’s using dup(0)
. Note that open()
and dup()
will assign the smallest available file descriptor, in this case that is 0, 1, and 2 in sequence. You will learn more about these stuffs in the last OS chapter.
/* Close all open file descriptors */
int x;
for (x = sysconf(_SC_OPEN_MAX); x>=0; x--)
{
close (x);
}
/*
* Attach file descriptors 0, 1, and 2 to /dev/null. */
fd0 = open("/dev/null", O_RDWR);
fd1 = dup(0);
fd2 = dup(0);
Step 9: execute daemon_work()
You are free to implement daemon_work()
however way you like. Here’s one example:
char output_file_path[PATH_MAX];
static int daemon_work()
{
// put your full PROJECT_DIR path here
strcpy(output_file_path, "[PROJECT_DIR]/dspawn.log");
int num = 0;
FILE *fptr;
char *cwd;
char buffer[1024];
// write PID of daemon in the beginning
fptr = fopen(output_file_path, "a");
if (fptr == NULL)
{
return EXIT_FAILURE;
}
fprintf(fptr, "Daemon process running with PID: %d, PPID: %d, opening logfile with FD %d\n", getpid(), getppid(), fileno(fptr));
// then write cwd
cwd = getcwd(buffer, sizeof(buffer));
if (cwd == NULL)
{
perror("getcwd() error");
return 1;
}
fprintf(fptr, "Current working directory: %s\n", cwd);
fclose(fptr);
while (1)
{
// use appropriate location if you are using MacOS or Linux
fptr = fopen(output_file_path, "a");
if (fptr == NULL)
{
return EXIT_FAILURE;
}
fprintf(fptr, "PID %d Daemon writing line %d to the file. \n", getpid(), num);
num++;
fclose(fptr);
sleep(10);
if (num == 10) // we just let this process terminate after 10 counts
break;
}
return EXIT_SUCCESS;
}
Alternatively, you can store the current working directory of dspawn
in its main function first, before the first fork()
so that you don’t have to hardcode the [PROJECT_DIR]
in daemon_work()
.
// in main() of dspawn.c
// Setup path
if (getcwd(output_file_path, sizeof(output_file_path)) == NULL)
{
perror("getcwd() error, exiting now.");
return 1;
}
strcat(output_file_path, "/dspawn.log");
Note that you won’t be able to simply do fptr = fopen("./dspawn.log", "a")
in daemon_work
because at this point, the daemon’s current working directory is root (/
). Since we did not elevate the privilege of the daemon process, we won’t be able to create any new file at root directory.
Observed Output
You should be able to spawn a daemon process using dspawn
command. The shell should almost immediately return with a new prompt:
You should be able to run multiple dspawn
command to spawn multiple daemon processes. All daemons should write the “log” to [PROJECT_DIR]/dspawn.log
.
A log file called dspawn.log
should appear at [PROJECT_DIR]
with the following content:
You can see clearly that the daemon process is adopted by the init
(or equivalent) process with PID of 1, and that it’s current working directory is root /
. The file dspawn.log
is opened using fd 3. To check whether fd 0,1,2, are all pointed to /dev/null
correctly, you can use the lsof -p [DAEMON_PID]
command:
In Linux distros, everything is a file. Hence, you can read any process’ file descriptor directly using the command: ls -l /proc/[PID]/fd
. lsof
works on both Linux-based OS and macOS.
If you spawn multiple daemon processes at the same time, the log file should show an interleaved printout of messages from each daemon process:
If you run dspawn
using WSL, your PPID
may not be 1. Please find proper Linux distro (be it using someone’s VM or dual boot) to confirm that your daemon process’ PPID is either 1
, 2
, or other dedicated processes that “adopts” orphaned process. Run the command pstree -p
to help you understand the process hierarchy better.
dcheck
This system program simply checks how many live daemons (spawned from dspawn
) are alive right now:
Implementation Notes
The easiest way to find out how many daemon processes spawned from dspawn
are running right now is by using the ps
command, and filtering its output with the name dspawn
that doesn’t have an output terminal:
The daemon process’ name inherits dspawn
command name, hence they both have the same name.
ps -efj | grep dspawn | grep -Ev 'tty|pts'
From the screenshot example above, we have two daemon processes that are alive. This corresponsds to ps
output, with two lines (entries) reported. You can execute any command from a C program using the system call API system
.
hint
You might need to use some kind of output redirection to read and analyse the output of
ps
executed from your C script withsystem
.
backup
The backup
system program must be able to automatically zip a directory whose name matches environment variable BACKUP_DIR
, and move this zipped directory to [PROJECT_DIR]/archive/
. BACKUP_DIR
must be set up by the shell before calling backup
. The filename of the zipped file must include the correct datetime indicating when the backup
command was executed.
Here’s an example of running backup
before and after BACKUP_DIR
environment variable is set:
You are free to decorate the output of backup
command, as long as the bare minimum information as shown above is included.
It should also be able to zip a single file if BACKUP_DIR
points to a file instead of a directory:
Summary
By the end of this task, your shell will have access to these 4 new system program at [PROJECT_DIR]/bin
.