Quote:
Originally Posted by
dhirend_6d
What Is a Kernel?
The UNIX kernel is the software that manages the user program's access
to the systems hardware and software resources. These resources range
from being granted CPU time, accessing memory, reading and writing to
the disk drives, connecting to the network, and interacting with the
terminal or GUI interface. The kernel makes this all possible by
controlling and providing access to memory, processor, input/output
devices, disk files, and special services to user programs.
Kernel Services
The basic UNIX kernel can be broken into four main subsystems:
Process Management
Memory Management
I/O Management
File Management
These subsystems should be viewed as separate entities that work in
concert to provide services to a program that enable it to do meaningful
work. These management subsystems make it possible for a user to access
a database via a Web interface, print a report, or do something as
complex as managing a 911 emergency system. At any moment in the system,
numerous programs may request services from these subsystems. It is the
kernel's responsibility to schedule work and, if the process is
authorized, grant access to utilize these subsystems. In short, programs
interact with the subsystems via software libraries and the systems
call interface. We'll start by looking at how the UNIX kernel comes to
life by way of the system initialization process.
System Initialization
System initialization (booting) is the first step toward bringing your
system into an operational state. A number of machine-dependent and
machine-independent steps are gone through before your system is ready
to begin servicing users. At system startup, there is nothing running on
the Central Processing Unit (CPU). The kernel is a complex program that
must have its binary image loaded at a specific address from some type
of storage device, usually a disk drive. The boot disk maintains a small
restricted area called the boot sector that contains a boot program
that loads and initializes the kernel. You'll find that this is a vendor
specific procedure that reflects the architectural hardware differences
between the various UNIX vendor platforms. When this step is completed,
the CPU must jump to a specific memory address and start executing the
code at that location. Once the kernel is loaded, it goes through its
own hardware and software initialization.
Kernel Mode
The operating system, or kernel, runs in a privileged manner known as
kernel mode. This mode of operation allows the kernel to run without
being interfered with by other programs currently in the system. The
microprocessor enforces this line of demarcation between user and kernel
level mode. With the kernel operating in its own protected address
space, it is guaranteed to maintain the integrity of its own data
structures and that of other processes. (That's not to say that a
privileged process could not inadvertently cause corruption within the
kernel.) These data structures are used by the kernel to manage and
control itself and any other programs that may be running in the system.
If any of these data structures were allowed to be accidentally or
intentionally altered, the system could quickly crash. Now that we have
learned what a UNIX kernel is and how it is loaded into the system, we
are ready to take a look at the four UNIX subsystems Process Management,
Memory Management, Filesystem Management and I/O Management.
Process Management
The Process Management subsystem controls the creation, termination,
accounting, and scheduling of processes. It also oversees process state
transitions and the switching between privileged and nonprivileged modes
of execution. The Process Management subsystem also facilitates and
manages the complex task of the creation of child processes.
A simple definition of a process is that it is an executing program. It
is an entity that requires system resources, and it has a finite
lifetime. It has the capability to create other processes via the system
call interface. In short, it is an electronic representation of a
user's or programmer's desire to accomplish some useful piece of work. A
process may appear to the user as if it is the only job running in the
machine. This "sleight of hand" is only an illusion. At any one time a
processor is only executing a single process.
Process Structure
A process has a definite structure (see Figure 19.1). The kernel views
this string of bits as the process image. This binary image consists of
both a user and system address space as well as registers that store the
process's data during its execution. The user address space is also
known as the user image. This is the code that is written by a
programmer and compiled into an ".o " object file. An object file is a
file that contains machine language code/data and is in a format that
the linker program can use to then create an executable program.
Diagram of process areas.
The user address space consists of five separate areas: Text, Data, Bss, stack, and user area.
Text Segment The first area of a process is its text segment.
This area contains the executable program code for the process. This
area is shared by other processes that execute the program. It is
therefore fixed and unchangeable and is usually swapped out to disk by
the system when memory gets too tight.
Data Area The data area contains both the global and static
variables used by the program. For example, a programmer may know in
advance that a certain data variable needs to be set to a certain value.
In the C programming language, it would look like:
If you were to look at the data segment when the program was
loaded, you would see that the variable x was an integer type with an
initial value of 15.
Bss Area The bss area, like the data area, holds information for
the programs variables. The difference is that the bss area maintains
variables that will have their data values assigned to them during the
programs execution. For example, a programmer may know that she needs
variables to hold certain data that will be input by a user during the
execution of the program.
Code:
int a,b,c; // a,b and c are variables that hold integer values.
char *ptr; // ptr is an unitialized character pointer.
The program code can also make calls to library routines like
malloc to obtain a chunk of memory and assign it to a variable like the
one declared above.
Stack Area The stack area maintains the process's local
variables, parameters used in functions, and values returned by
functions. For example, a program may contain code that calls another
block of code (possibly written by someone else). The calling block of
code passes data to the receiving block of code by way of the stack. The
called block of code then process's the data and returns data back to
the calling code. The stack plays an important role in allowing a
process to work with temporary data.
User Area The user area maintains data that is used by the kernel
while the process is running. The user area contains the real and
effective user identifiers, real and effective group identifiers,
current directory, and a list of open files. Sizes of the text, data,
and stack areas, as well as pointers to process data structures, are
maintained. Other areas that can be considered part of the process's
address space are the heap, private shared libraries data, shared
libraries, and shared memory. During initial startup and execution of
the program, the kernel allocates the memory and creates the necessary
structures to maintain these areas.
The user area is used by the kernel to manage the process. This area
maintains the majority of the accounting information for a process. It
is part of the process address space and is only used by the kernel
while the process is executing(see Figure 19.2). When the process is not
executing, its user area may be swapped out to disk by the Memory
Manager. In most versions of UNIX, the user area is mapped to a fixed
virtual memory address.
Under HP-UX 10.X, this virtual address is
0x7FFE6000.
When the kernel performs a context switch (starts executing a different
process) to a new process, it will always map the process's physical
address to this virtual address. Since the kernel already has a pointer
fixed to this location in memory, it is a simple matter of referencing
the current u pointer to be able to begin managing the newly switched in
process. The file
/usr/include/sys/user.h contains the user area's structure definition for your version of UNIX.
Diagram of kernel address space.
Process Table The process table is another important structure
used by the kernel to manage the processes in the system. The process
table is an array of process structures that the kernel uses to manage
the execution of programs. Each table entry defines a process that the
kernel has created. The process table is always resident in the
computer's memory. This is because the kernel is repeatedly querying and
updating this table as it switches processes in and out of the CPU. For
those processes that are not currently executing, their process table
structures are being updated by the kernel for scheduling purposes. The
process structures for your system are defined in
/usr/include/sys/proc.h.
Fork Process The kernel provides each process with the tools to
duplicate itself for the purpose of creating a new process. This new
entity is termed a child process. The fork() system call is invoked by
an existing process (termed the parent process) and creates a replica of
the parent process. While a process will have one parent, it can spawn
many children. The new child process inherits certain attributes from
its parent.
Process Run States
A process moves between several states during its lifetime, although a
process can only be in one state at any one time. Certain events, such
as system interrupts, blocking of resources, or software traps will
cause a process to change its run state. The kernel maintains queues in
memory that it uses to assign a process to based upon that process's
state. It keeps track of the process by its user ID.
UNIX version System V Release 4 (SVR4) recognizes the following process run states:
Code:
- SIDLE This is the state right after a process has issued
a fork() system call. A process image has yet to be copied into memory.
- SRUN The process is ready to run and is waiting to be executed by the CPU.
- SONPROC The process is currently being executed by the CPU.
- SSLEEP The process is blocking on an event or resource.
- SZOMB The process has terminated and is waiting on
either its parent or the init process to allow it to completely exit.
- SXBRK The process is has been switched out so that another process can be executed.
- SSTOP The process is stopped.[/COLOR]
When a process first starts, the kernel allocates it a slot in the
process table and places the process in the SIDL state. Once the
process has the resources it needs to run, the kernel places it onto the
run queue. The process is now in the SRUN state awaiting its turn in
the CPU. Once its turn comes for the process to be switched into the
CPU, the kernel will tag it as being in the SONPROC state. In this
state, the process will execute in either user or kernel mode. User mode
is where the process is executing nonprivileged code from the user's
compiled program. Kernel mode is where kernel code is being executed
from the kernel's privileged address space via a system call.
At some point the process is switched out of the CPU because it has
either been signaled to do so (for instance, the user issues a stop
signal--SSTOP state) or the process has exceeded its quota of allowable
CPU time and the kernel needs the CPU to do some work for another
process. The act of switching the focus of the CPU from one process to
another is called a context switch. When this occurs, the process enters
what is known as the SXBRK state. If the process still needs to run and
is waiting for another system resource, such as disk services, it will
enter the SSLEEP state until the resource is available and the kernel
wakes the process up and places it on the SRUN queue. When the process
has finally completed its work and is ready to terminate, it enters the
SZOMB state. We have seen the fundamentals of what states a process can
exist in and how it moves through them. Let's now learn how a kernel
schedules a process to run.
Process Scheduler
Most modern versions of UNIX (for instance, SVR4 and Solaris 2.x) are
classified as preemptive operating systems. They are capable of
interrupting an executing a process and "freezing" it so that the CPU
can service a different process. This obviously has the advantage of
fairly allocating the system's resources to all the processes in the
system. This is one goal of the many systems architects and programmers
who design and write schedulers. The disadvantages are that not all
processes are equal and that complex algorithms must be designed and
implemented as kernel code in order to maintain the illusion that each
user process is running as if it was the only job in the system. The
kernel maintains this balance by placing processes in the various
priority queues or run queues and apportioning its CPU time-slice based
on its priority class (Real-Time versus Timeshare).
Memory Management
Random access memory (RAM) is a very critical component in any computer
system. It's the one component that always seems to be in short supply
on most systems. Unfortunately, most organizations' budgets don't allow
for the purchase of all the memory that their technical staff feel is
necessary to support all their projects. Luckily, UNIX allows us to
execute all sorts of programs without, what appears at first glance to
be, enough physical memory. This comes in very handy when the system is
required to support a user community that needs to execute an
organization's custom and commercial software to gain access to its
data.
Memory chips are high-speed electronic devices that plug directly into your computer. Main memory is also called
core memory
by some technicians. Ever heard of a core dump? (Writing out main
memory to a storage device for post-dump analysis.) Usually it is caused
by a program or system crash or failure. An important aspect of memory
chips is that they can store data at specific locations called
addresses. This makes it quite convenient for another hardware device
called the central processing unit (CPU) to access these locations to
run your programs. The kernel uses a paging and segmentation arrangement
to organize process memory. This is where the memory management
subsystem plays a significant role. Memory management can be defined as
the efficient managing and sharing of the system's memory resources by
the kernel and user processes.
Memory management follows certain rules that manage both physical and
virtual memory. Since we already have an idea of what a physical memory
chip or card is, we will provide a definition of virtual memory.
Virtual memory
is where the addressable memory locations that a process can be mapped
into are independent of the physical address space of the CPU. Generally
speaking, a process can exceed the physical address space/size of main
memory and still load and execute.
The systems administrator should be aware that just because she has a
fixed amount of physical memory, she should not expect it all to be
available to execute user programs. The kernel is always resident in
main memory and depending upon the kernel's configuration (tunable-like
kernel tables, daemons, device drivers loaded, and so on), the amount
left over can be classified as available memory. It is important for the
systems administrator to know how much available memory the system has
to work with when supporting his environment. Most systems display
memory statistics during boot time. If your kernel is larger than it
needs to be to support your environment, consider reconfiguring a
smaller kernel to free up resources.
We learned before that a process has a well-defined structure and has
certain specific control data structures that the kernel uses to manage
the process during its system lifetime. One of the more important data
structures that the kernel uses is the virtual address space (vas in
HP-UX and as in SVR4. For a more detailed description of the layout of
these structures, look at the
vas.h or as.h header files under
/usr/include on your system.).
A virtual address space exists for each process and is used by the
process to keep track of process logical segments or regions that point
to specific segments of the process's text (code),
data, u_area, user, and kernel stacks;
shared memory; shared library; and memory mapped file segments.
Per-process regions protect and maintain the number of pages mapped into
the segments. Each segment has a virtual address space segment as well.
Multiple programs can share the process's text segment. The data
segment holds the process's initialized and uninitialized (BSS) data.
These areas can change size as the program executes.
The
u_area and
kernel stack contain information used by the kernel, and are a fixed size. The user stack is contained in the
u_area;
however, its size will fluctuate during its execution. Memory mapped
files allow programmers to bring files into memory and work with them
while in memory. Obviously, there is a limit to the size of the file you
can load into memory (check your system documentation). Shared memory
segments are usually set up and used by a process to share data with
other processes. For example, a programmer may want to be able to pass
messages to other programs by writing to a shared memory segment and
having the receiving programs attach to that specific shared memory
segment and read the message. Shared libraries allow programs to link to
commonly used code at runtime. Shared libraries reduce the amount of
memory needed by executing programs because only one copy of the code is
required to be in memory. Each program will access the code at that
memory location when necessary.
When a programmer writes and compiles a program, the compiler generates
the object file from the source code. The linker program (ld) links the
object file with the appropriate libraries and, if necessary, other
object files to generate the executable program. The executable program
contains virtual addresses that are converted into physical memory
addresses when the program is run. This address translation must occur
prior to the program being loaded into memory so that the CPU can
reference the actual code.
When the program starts to run, the kernel sets up its data structures
(proc, virtual address space, per-process region) and begins to execute
the process in user mode. Eventually, the process will access a page
that's not in main memory (for instance, the pages in its working set
are not in main memory). This is called a
page fault. When this
occurs, the kernel puts the process to sleep, switches from user mode to
kernel mode, and attempts to load the page that the process was
requesting to be loaded. The kernel searches for the page by locating
the per-process region where the virtual address is located. It then
goes to the segments (text, data, or other) per-process region to find
the actual region that contains the information necessary to read in the
page.
The kernel must now find a free page in which to load the process's
requested page. If there are no free pages, the kernel must either page
or swap out pages to make room for the new page request. Once there is
some free space, the kernel pages in a block of pages from disk. This
block contains the requested page plus additional pages that may be used
by the process. Finally the kernel establishes the permissions and sets
the protections for the newly loaded pages. The kernel wakes the
process and switches back to user mode so the process can begin
executing using the requested page. Pages are not brought into memory
until the process requests them for execution. This is why the system is
referred to as a
demand paging system.
The memory management unit is a hardware component that handles the
translation of virtual address spaces to physical memory addresses. The
memory management unit also prevents a process from accessing another
process's address space unless it is permitted to do so (protection
fault). Memory is thus protected at the page level. The
Translation Lookaside Buffer (TLB)
is a hardware cache that maintains the most recently used virtual
address space to physical address translations. It is controlled by the
memory management unit to reduce the number of address translations that
occur on the system.
Input and Output Management
The simplest definition of
input/output is the control of data
between hardware devices and software. A systems administrator is
concerned with I/O at two separate levels. The first level is concerned
with I/O between user address space and kernel address space; the second
level is concerned with I/O between kernel address space and physical
hardware devices. When data is written to disk, the first level of the
I/O subsystem copies the data from user space to kernel space. Data is
then passed from the kernel address space to the second level of the I/O
subsystem. This is when the physical hardware device activates its own
I/O subsystems, which determine the best location for the data on the
available disks.
The
OEM (Original Equipment Manufacture) UNIX configuration is
satisfactory for many work environments, but does not take into
consideration the network traffic or the behavior of specific
applications on your system. Systems administrators find that they need
to reconfigure the systems I/O to meet the expectations of the users and
the demands of their applications. You should use the default
configuration as a starting point and, as experience is gained with the
demands on the system resources, tune the system to achieve peak I/O
performance.
UNIX comes with a wide variety of tools that monitor system performance.
Learning to use these tools will help you determine whether a
performance problem is hardware or software related. Using these tools
will help you determine whether a problem is poor user training,
application tuning, system maintenance, or system configuration.
sar, iostat, and
monitor are some of your best basic I/O performance monitoring tools.
1)
sar The sar command writes to standard output the contents of
selected cumulative activity counters in the operating system. The
following list is a breakdown of those activity counters that sar
accumulates.
* File access
* Buffer usage
* system call activity
* Disk and tape input/output activity
* Free memory and swap space
* Kernel Memory Allocation (KMA)
* Interprocess communication
* Paging
* Queue Activity
* Central Processing Unit (CPU)
* Kernel tables
* Switching
* Terminal device activity
2)
iostat Reports CPU statistics and input/output statistics for TTY devices, disks, and CD-ROMs.
3)
monitor Like the sar command, but with a visual representation of the computer state.
RAM I/O
The memory subsystem comes into effect when the programs start
requesting access to more physical RAM memory than is installed on your
system. Once this point is reached, UNIX will start I/O processes called
paging and
swapping. This is when kernel procedures start
moving pages of stored memory out to the paging or swap areas defined
on your hard drives. (This procedure reflects how swap files work in
Windows by Microsoft for a PC.) All UNIX systems use these procedures to
free physical memory for reuse by other programs. The drawback to this
is that once paging and swapping have started, system performance
decreases rapidly. The system will continue using these techniques until
demands for physical RAM drop to the amount that is installed on your
system. There are only two physical states for memory performance on
your system: Either you have enough RAM or you don't, and performance
drops through the floor.
Memory performance problems are simple to diagnose; either you have enough memory or your system is
thrashing.
Computer systems start thrashing when more resources are dedicated to
moving memory (paging and swapping) from RAM to the hard drives.
Performance decreases as the CPUs and all subsystems become dedicated to
trying to free physical RAM for themselves and other processes.
This summary doesn't do justice, however, to the complexity of memory
management nor does it help you to deal with problems as they arise. To
provide the background to understand these problems, we need to discuss
virtual memory activity in more detail.
We have been discussing two memory processes: paging and swapping. These
two processes help UNIX fulfill memory requirements for all processes.
UNIX systems employ both paging and swapping to reduce I/O traffic and
execute better control over the system's total aggregate memory. Keep in
mind that paging and swapping are temporary measures; they cannot fix
the underlying problem of low physical RAM memory.
Swapping moves entire idle processes to disk for reclamation of memory,
and is a normal procedure for the UNIX operating system. When the idle
process is called by the system again, it will copy the memory image
from the disk swap area back into RAM.
On systems performing paging and swapping, swapping occurs in two
separate situations. Swapping is often a part of normal housekeeping.
Jobs that sleep for more that 20 seconds are considered idle and may be
swapped out at any time. Swapping is also an emergency technique used to
combat extreme memory shortages. Remember our definition of thrashing;
this is when a system is in trouble. Some system administrators sum this
up very well by calling it "desperation swapping."
Paging, on the other hand, moves individual pages (or pieces) of
processes to disk and reclaims the freed memory, with most of the
process remaining loaded in memory. Paging employs an algorithm to
monitor usage of the pages, to leave recently accessed pages in physical
memory, and to move idle pages into disk storage. This allows for
optimum performance of I/O and reduces the amount of I/O traffic that
swapping would normally require.
NOTE: Monitoring what the system is doing is easy with the ps
command. ps is a "process status" command on all UNIX systems and
typically shows many idle and swapped-out jobs. This command has a rich
amount of options to show you what the computer is doing.
I/O performance management, like all administrative tasks, is a
continual process. Generating performance statistics on a routine basis
will assist in identifying and correcting potential problems before they
have an impact on your system or, worst case, your users. UNIX offers
basic system usage statistics packages that will assist you in
automatically collecting and examining usage statistics.
You will find the load on the system will increase rapidly as new jobs
are submitted and resources are not freed quickly enough. Performance
drops as the disks become I/O bound trying to satisfy paging and
swapping calls. Memory overload quickly forces a system to become I/O
and CPU bound.
Filesystem Concept
Filesystem is the collection place on disk device(s) for files.
Visualize the filesystem as consisting of a single node at the highest
level (ROOT) and all other nodes descending from the root node in a
tree-like fashion (see Figure 19.5) . The second meaning will be used
for this discussion, and Hewlett Packard's High-performance Filesystem
will be used for technical reference purposes.
Diagram of a Android' s hierarchical filesystem.
The superblock is the key to maintaining the filesystem. It's an 8 KB
block of disk space that maintains the current status of the filesystem.
Because of its importance, a copy is maintained in memory and at each
cylinder group within the filesystem. The copy in main memory is updated
as events transpire. The update daemon is the actual process that calls
on the kernel to flush the cached superblocks, modified inodes, and
cached data blocks to disk. The superblock maintains the following
static and dynamic information about the