Articles and Papers
Below is a selection of papers that I've written (with links when available online).
"
Scheduling in Hadoop"
IBM developerWorks, December 2011
Hadoop implements the ability for pluggable schedulers that assign resources to jobs.
However, as we know from traditional scheduling, not all algorithms are the same, and
efficiency is workload and cluster dependent. Get to know Hadoop scheduling, and
explore two of the algorithms available today: fair scheduling and capacity scheduling.
Also, learn how these algorithms are tuned and in what scenarios they're relevant.
"
Spark, An Alternative for Fast Data Analytics"
IBM developerWorks, November 2011
Although Hadoop captures the most attention for distributed data analytics, there are
alternatives that provide some interesting advantages to the typical Hadoop platform.
Spark is a scalable data analytics platform that incorporates primitives for in-memory
computing and therefore exercises some performance advantages over Hadoop's cluster
storage approach. Spark is implemented in and exploits the Scala language, which
provides a unique environment for data processing. Get to know the Spark approach for
cluster computing and its differences from Hadoop.
"
Practical Approaches to Cloud-based High Availability"
IBM developerWorks, October 2011
Although high availability (HA) is a complex activity, regardless of the application,
clouds and their virtualization platforms actually make this objective simpler and
more straightforward. Virtualization as an abstraction from the physical platform
creates a new opportunity for HA. In this article, explore some of the practical
approaches to cloud-based HA, including stateless failover and the more useful
stateful failover. Also, discover the various open source software components at play
in HA systems.
"
Data Mining with Ruby and Twitter"
IBM developerWorks, October 2011
Twitter is not only a fantastic real-time social networking tool, it's also a source of
rich information that's ripe for data mining. On average, Twitter users generate 140
million tweets per day on a variety of topics. This article introduces you to data mining
and demonstrates the concept with the object-oriented Ruby language.
"
Open Source Physics Engines"
IBM developerWorks, July 2011
Graphics give games a visual appeal, but it's the internal physics engine that gives the
game's world life. A physics engine is a software component that provides a simulation of
a physical system. This simulation can include soft- and rigid-body dynamics, fluid
dynamics, and collision detection. The open source community has a number of useful physics
engines operating in the 2D and 3D domains targeted to games and simulations. This article
introduces the use and basics of a physics engine and explores two options that exist: Box2D
and Bullet.
"
Ceylon: True advance, or just another language?"
IBM developerWorks, June 2011
The language road in computer science is littered with the carcasses of what was to be
"the next big thing." And although many niche languages do find some adoption in scripting
or specialized applications, C (and its derivatives) and the Java language are difficult
to displace. But Red Hat's Ceylon appears to be an interesting combination of language
features, using a well-known C-style syntax but with support for object orientation and
useful functional aspects in addition to an emphasis on being succinct. Explore Ceylon and
find out if this future VM language can find a place in enterprise software development.
"
Application Virtualization, past and future"
IBM developerWorks, May 2011
When you hear the phrase virtual machine today, you probably think of virtualization
and hypervisors. But VMs are simply an older concept of abstraction, a common method
of abstracting one entity from another. This article explores two of the many newer
open source VM technologies: Dalvik (the VM core of the Android operating system) and
Parrot (an open source VM technology for efficiently executing dynamic languages).
"
Virtualization and Embedded Systems"
IBM developerWorks, April 2011
Today's technical news is filled with stories of server and desktop virtualization,
but there's another virtualization technology that's growing rapidly: embedded
virtualization. The embedded domain has several useful applications for virtualization,
including mobile handsets, security kernels, and concurrent embedded operating systems.
This article explores the area of embedded virtualization and explains why it's coming
to an embedded system near you.
"
Linux and the Storage Ecosystem"
IBM developerWorks, March 2011
Linux is the Swiss Army knife of file systems, and it also offers a wide variety of
storage technologies for both desktops and servers. Beyond the file system, Linux
incorporates world-class NAS and SAN technologies, data protection, storage management,
support for clouds, and solid-state storage. Learn more about the Linux storage
ecosystem and why it's number one in server market share.
"
Emulation and Computing History"
IBM developerWorks, March 2011
The simplest computing devices we use today have more processing capability than
the most capable computing systems of yesterday. For example, the VAX 11/780
delivered around 0.5 MIPS in the early 1980s. Compare that to an IBM zEnterprise
196 (z196) mainframe of today, which can support well over 52 KMIPS. However, we
can learn a lot from early computing history. If you've ever wanted to boot an
IBM 1130, PDP-11, or MITS Altair, then the Computer History Simulation Project is
just what you've been looking for.
"
Linux Scheduler Simulation"
IBM developerWorks, February 2011
Scheduling is one of the most complex�and interesting�aspects of the Linux kernel.
Developing schedulers that provide suitable behavior for single-core machines to
quad-core servers can be difficult. Luckily, the Linux Scheduler Simulator
(LinSched) hosts your Linux scheduler in user space (for scheduler prototyping)
while modeling arbitrary hardware targets to validate your scheduler across a
spectrum of topologies. Learn about LinSched and how to experiment with your
scheduler for Linux.
"
Data Visualization with Processing, Part 3"
IBM developerWorks, February 2011
This final article in the "Data visualization with Processing" series explores
some of Processing's more advanced features, starting with an introduction to
2-D and 3-D graphics and lighting features. Then explore physics applications
with graphical visualization, learn about Processing's networking features, and
develop a simple application that visualizes data from the Internet.
"
Platform Emulation with Bochs"
IBM developerWorks, January 2011
Bochs, like QEMU, is a portable emulator that provides a virtualization
environment in which to run an operating system using an emulated platform
in the context of another operating system. Bochs isn't a hypervisor but
rather a PC-compatible emulator useful for legacy software. Learn about
platform emulation using Bochs and its approach to hardware emulation.
"
Run ZFS on Linux"
IBM developerWorks, January 2011
Although ZFS exists in an operating system whose future is at risk, it is
easily one of the most advanced, feature-rich file systems in existence. It
incorporates variable block sizes, compression, encryption, de-duplication,
snapshots, clones, and (as the name implies) support for massive capacities.
Get to know the concepts behind ZFS and learn how you can use ZFS today on
Linux using Filesystem in Userspace (FUSE).
"
Data Visualization wtih Processing, Part 2: Intermediate data visualization using interfaces, objects, images, and applications"
IBM developerWorks, January 2011
Part 1 of this "Data visualization with Processing" series introduces the
Processing language and development environment and demonstrated the language's
basic graphical capabilities. This second article explores Processing's more
advanced features, including UIs and object-oriented programming. Learn about
image processing and how to convert your Processing application into a Java
applet suitable for the web, and explore an optimization algorithm that lends
itself well to visualization.
"
Anatomy of a Cloud Storage Infrastructure"
IBM developerWorks, November 2010
Cloud storage (or data storage as a service) is the abstraction of storage
behind an interface where the storage can be administered on demand. Further,
the interface abstracts the location of the storage such that it is irrelevant
whether the storage is local or remote (or hybrid). Cloud storage
infrastructures introduce new architectures that support varying levels of
service over a potentially large set of users and geographically distributed
storage capacity. Learn about the key architectural attributes of cloud
storage architectures�from data protection and integrity to security and
storage optimization.
"
Data Visualization with Processing, Part 1: An introduction to the language and environment"
IBM developerWorks, November 2010
Building graphical applications and applications that present complex data
can be difficult. Although many graphical libraries exist, they cater to
advanced users or present non-trivial APIs. The Processing language and
environment solves this problem by creating a portable environment and
language for graphical presentation. Processing makes it simple to build
applications that present static data, dynamic data (such as animations),
or interactive data. This first article in the series explores building
applications for visualization in particular, simulations for life sciences.
"
Network Filesystems and Linux"
IBM developerWorks, November 2010
Network File System (NFS) has been around since 1984, but it continues to
evolve and provide the basis for distributed file systems. Today, NFS
(through the pNFS extension) provides scalable access to files distributed
across a network. Explore the ideas behind distributed file systems and in
particular, recent advances in NFS.
"
Virtual networking in Linux"
IBM developerWorks, October 2010
With the explosive growth of platform virtualization, it's not surprising
that other parts of the enterprise ecosystem are being virtualized, as well.
One of the more recent areas is virtual networking. Early implementations of
platform virtualization created virtual NICs, but today, larger portions of
the network are being virtualized, such as switches that support communication
among VMs on a server or distributed among servers. Explore the ideas behind
virtual networking, with a focus on NIC and switch virtualization.
"
Kernel Logging: APIs and Implementation"
IBM developerWorks, September 2010
In kernel development, we use printk for logging without much thought.
But have you considered the process and underlying implementation of
kernel logging? Explore the entire process of kernel logging, from
printk to insertion into the user space log file.
"
Cloud Computing"
Datamation, September 2010
Cloud computing gets enormous attention from tech professionals. And cloud
computing justifies all the focus. Not that it�s a simple idea: It's an
amorphous term that is stretched and molded to fit a variety of perspectives.
It will take time to fully understand what cloud computing is and isn�t.
In this article, we�ll explore cloud computing from the computing and storage
angle, and review some of the most important offerings in the various cloud models.
"
User Space Memory Access from the Linux Kernel"
IBM developerWorks, August 2010
As the kernel and user space exist in different virtual address spaces,
there are special considerations for moving data between them. Explore
the ideas behind virtual address spaces and the kernel APIs for data
movement to and from user space, and learn some of the other mapping
techniques used to map memory.
"
High Availability with the Distributed Replicated Block Device"
IBM developerWorks, August 2010
The 2.6.33 Linux kernel has introduced a useful new service called the
Distributed Replicated Block Device (DRBD). This service mirrors an
entire block device to another networked host during run time,
permitting the development of high-availability clusters for block data.
Explore the ideas behind the DRBD and its implementation in the Linux
kernel.
"
"
Distributed data processing with Hadoop, Part 3: Application development"
IBM developerWorks, July 2010
With configuration, installation, and the use of Hadoop in single- and
multi-node architectures under your belt, you can now turn to the task
of developing applications within the Hadoop infrastructure. This final
article in the series explores the Hadoop APIs and data flow and
demonstrates their use with a simple mapper and reducer application.
"
Data Migration: EMC, Compellent, 3PAR, FalconStor"
Datamation, July 2010
As is the case for most technology domains, change is the only constant. The
storage ecosystem is a great example, where change is not only occurring, but
at all levels from the individual storage devices, to the baseline services
and front-end protocols that are used to manipulate our growing masses of data.
In this article we'll explore one of those services and along the way touch on
many of the related evolutions and revolutions that are happening today.
"
Distributed data processing with Hadoop, part 2: Going further"
IBM developerWorks, June 2010
The first article in this series showed how to use Hadoop in a single-node
cluster. This article continues with a more advanced setup that uses multiple
nodes for parallel processing. It demonstrates the various node types required
for multinode clusters and explores MapReduce functionality in a parallel
environment. This article also digs into the management aspects of Hadoop both
command line and Web based.
"
Virtualization"
Datamation, May 2010
When you read about virtualization today, much of the content focuses on
the relatively new concept of server virtualization. In this context,
multiple operating system and application sets are virtualized on a single
server, allowing it to be more efficiently and cost effectively used. While
this drives much of the innovation (and revenue) around virtualization today,
there are a multitude of virtualization schemes addressing a spectrum of
applications. In this article, we'll explore many of the ideas around
virtualization and identify their uses and advantages.
"
Distributed data processing with Hadoop, part 1: Getting Started"
IBM developerWorks, May 2010
This article�the first in a series on Hadoop�explores the Hadoop framework,
including its fundamental elements, such as the Hadoop file system (HDFS),
and node types that are commonly used. Learn how to install and configure a
single-node Hadoop cluster, and delve into the MapReduce application. Finally,
discover ways to monitor and manage Hadoop using its core Web interfaces.
"
Ceph: A Linux Petabyte-scale Distributed File System"
IBM developerWorks, May 2010
Linux continues to invade the scalable computing space and, in particular, the
scalable storage space. A recent addition to Linux's impressive selection of file
systems is Ceph, a distributed file system that incorporates replication and fault
tolerance while maintaining POSIX compatibility. Explore the architecture of Ceph
and learn how it provides fault tolerance and simplifies the management of massive
amounts of data.
"
Virtual Linux: Platform and OS Linux Virtualization"
Datamation, May 2010
Virtual Linux is accomplished through many techniques, ranging from emulation
to platform to OS virtualization. Indeed, Linux is a unique operating system
in its breadth of virtualization solutions that are available. In this article,
we'll explore the various ways that virtualization is achieved and then review
the various solutions provided through virtual Linux.
"
Anatomy of Linux Kernel Shared Memory"
IBM developerWorks, April 2010
Linux as a hypervisor includes a number of innovations, and one of the more
interesting changes in the 2.6.32 kernel is Kernel Shared Memory (KSM). KSM
allows the hypervisor to increase the number of concurrent virtual machines by
consolidating identical memory pages. Explore the ideas behind KSM (such as
storage de-duplication), its implementation, and how you manage it.
"
Kernel APIs, Part 3: Timers and Lists in the 2.6 Kernel"
IBM developerWorks, March 2010
The Linux kernel includes a variety of APIs intended to help developers build
simpler and more efficient driver and kernel applications. Two of the more common
APIs that can be used for work deferral are the list management and timer APIs.
Discover these APIs, and learn how to develop kernel applications with timers and
lists.
"
Anatomy of an Open Source Cloud"
IBM developerWorks, March 2010
Cloud computing is no longer a technology on the cusp of breaking out but a
valuable and important technology that is fundamentally changing the way we
use and develop applications. As you would expect, Linux� and open source
provide the foundation for the cloud (for public and private infrastructures).
Explore the anatomy of the cloud, its architecture, and the open source
technologies used to build these dynamic and scalable computing and storage
platforms.
"
Deferrable Functions,
Kernel Tasklets, and Work Queues"
IBM developerWorks, March 2010
For high-frequency threaded operations, the Linux kernel provides tasklets and work
queues. Tasklets and work queues implement deferrable functionality and replace the
older bottom-half mechanism for drivers. This article explores the use of tasklets
and work queues in the kernel and shows you how to build deferrable functions with
these APIs.
"
Invoking User-Space
Applications from the Kernel"
IBM developerWorks, February 2010
The Linux system call interface permits user-space applications to invoke
functionality in the kernel, but what about invoking user-space applications
from the kernel? Explore the usermode-helper API, and learn how to invoke
user-space applications and manipulate their output.
"
Virtio: An I/O
Virtualization Framework for Linux"
IBM developerWorks, January 2010
The Linux kernel supports a variety of virtualization schemes, and that's likely
to grow as virtualization advances and new schemes are discovered (for example,
lguest). But with all these virtualization schemes running on top of Linux, how
do they exploit the underlying kernel for I/O virtualization? The answer is
virtio, which provides an efficient abstraction for hypervisors and a common set
of I/O virtualization drivers. Discover virtio, and learn why Linux will soon be
the hypervisor of choice.
"
Anatomy of the libvirt
Virtualization Library"
IBM developerWorks, January 2010
The libvirt library is a Linux API over the virtualization capabilities of Linux
that supports a variety of hypervisors, including Xen and KVM, as well as QEMU
and some virtualization products for other operating systems. This article explores
libvirt, its use, and its architecture.
"
Inside the Linux 2.6 Completely Fair Scheduler"
IBM Developerworks, December 2009
The task scheduler is a key part of any operating system, and Linux� continues
to evolve and innovate in this area. In kernel 2.6.23, the Completely Fair
Scheduler (CFS) was introduced. This scheduler, instead of relying on run queues,
uses a red-black tree implementation for task management. Explore the ideas behind
CFS, its implementation, and advantages over the prior O(1) scheduler.
"
Linux introspection and SystemTap"
IBM Developerworks, November 2009
Modern operating system kernels provide the means for introspection, the ability to
peer dynamically within the kernel to understand its behaviors. These behaviors can
indicate problems in the kernel as well as performance bottlenecks. With this knowledge,
you can tune or modify the kernel to avoid failure conditions. Discover an open source
infrastructure called SystemTap that provides this dynamic introspection for the
Linux� kernel.
"
Next-generation Linux file systems: NiLFS(2) and exofs"
IBM Developerworks, November 2009
Linux� continues to innovate in the area of file systems. It supports the largest
variety of file systems of any operating system. It also provides cutting-edge
file system technology. Two new file systems that are making their way into Linux
include the NiLFS(2) log-structured file system and the exofs object-based storage
system. Discover the purpose behind these two new file systems and the advantages
that they bring.
"
Virtual appliances and the Open Virtualization Format"
IBM Developerworks, October 2009
Not only has virtualization advanced the state of the art in maximizing server
efficiency, it has also opened the door to new technologies that were not possible
before. One of these technologies is the virtual appliance, which fundamentally
changes the way software is delivered, configured, and managed. But the power
behind virtual appliances lies in the ability to freely share them among different
hypervisors. Learn the ideas and benefits behind virtual appliances, and discover a
standard solution for virtual appliance interoperability called the Open
Virtualization Format.
"
Linux Virtualization and PCI Passthrough"
IBM Developerworks, October 2009
Processors have evolved to improve performance for virtualized environments, but
what about I/O aspects? Discover one such I/O performance enhancement called device
(or PCI) passthrough. This innovation improves performance of PCI devices using
hardware support from Intel (VT-d) or AMD (IOMMU).
"
Meet the Extensible Messaging and Presence Protocol (XMPP)"
IBM Developerworks, September 2009
XMPP is a open protocol for XML-based communication over the Internet. Although
it is most popular as an instant-messaging protocol, you can use it as a general
messaging service, as well. Discover the ins and outs of XMPP, and learn how to
use it for simple messaging.
"
Conversing through the Internet with cURL and libcurl"
IBM Developerworks, September 2009
cURL is a command-line tool that speaks a number of protocols for file transfer,
including HTTP, FTP, Secure Copy (SCP), Telnet, and others. But in addition to
conversing with endpoints over the Internet from the command line, you can also
write simple to complex programs using libcurl to automate application-layer
protocol tasks. This article introduces the cURL command-line tool, then shows
you how to build an HTTP client in C and Python using libcurl.
"
Anatomy of the Linux virtual file system switch"
IBM Developerworks, August 2009
Linux is the very definition of flexibility and extensibility. Take the virtual
file system switch (VFS). You can create file systems on a variety of devices,
from traditional disk, USB flash drives, memory, and other storage devices. You
can even embed a file system within the context of another file system. Discover
what makes the VFS so powerful, and learn its major interfaces and processes.
"
The Blue Programming Language"
IBM Developerworks, August 2009
Languages are the means by which we express our desires to computers systems,
and, as far as I'm concerned, there's no such thing as too many. One unique
language, called Blue, is an open source object-oriented language that is
multipurpose and intuitive to use. This tip provides the foundation for Blue
and shows you how to build simple networking applications.
"
Anatomy of a Linux Hypervisor"
IBM Developerworks, May 2009
One of the most important modern innovations of Linux� is its transformation
into a hypervisor (or, an operating system for other operating systems). A
number of hypervisor solutions have appeared that use Linux as the core. This
article explores the ideas behind the hypervisor and two particular hypervisors
that use Linux as the platform (KVM and Lguest).
"
Linux Kernel Advances"
Life's certainties include death and taxes but also the advancement of the
GNU/Linux� operating system, and the last two kernel releases did not
disappoint. The 2.6.28 and 2.6.29 releases contain an amazing amount of new
functionality, such as a cutting-edge enterprise storage protocol, two new
file systems, WiMAX broadband networking support, and storage integrity
checking. Discover why it's time to upgrade.
"
Anatomy of ext4"
IBM DeveloperWorks, February 2009
The fourth extended file system, or ext4, is the next generation of
journaling file systems, retaining backward compatibility with the previous
file system, ext3. Although ext4 is not currently the standard, it will be
the next default file system for most Linux� distributions. Get to know ext4,
and discover why it will be your new favorite file system.
"
GCC Hacks in the Linux Kernel"
IBM DeveloperWorks, November 2008.
The Linux kernel uses several special capabilities of the GNU Compiler
Collection (GCC) suite. These capabilities range from giving you shortcuts
and simplifications to providing the compiler with hints for optimization.
Discover some of these special GCC features and learn how to use them in
the Linux kernel.
"
Get to know GCC4"
IBM DeveloperWorks, October 2008.
In the last few years, the GNU Compiler Collection (GCC) has undergone a major
transition from GCC version 3 to version 4. With GCC 4 comes a new optimization
framework (and new intermediate code representation), new target and language
support, and a variety of new attributes and options. Get to know the major new
features and their benefits.
"
Cloud Computing with Linux"
IBM DeveloperWorks, September 2008.
Cloud computing and storage convert physical resources (like processors and
storage) into scalable and shareable resources over the Internet (computing
and storage "as a service"). Although not a new concept, virtualization makes
this much more scalable and efficient through the sharing of physical systems
through server virtualization. Cloud computing gives users access to massive
computing and storage resources without their having to know where those
resources are or how they're configured. As you might expect, Linux plays a
huge role. Discover cloud computing, and learn why there's a penguin behind
that silver lining.
"
Anatomy of Linux Dynamic Libraries"
IBM DeveloperWorks, August 2008.
Dynamically linked shared libraries are an important aspect of GNU/Linux. They
allow executables to dynamically access external functionality at run time and
thereby reduce their overall memory footprint (by bringing functionality in when
it's needed). This article investigates the process of creating and using dynamic
libraries, provides details on the various tools for exploring them, and explores
how these libraries work under the hood.
"
Anatomy of Linux Loadable Kernel Modules"
IBM DeveloperWorks, July 2008.
Linux loadable kernel modules, introduced in version 1.2 of the kernel, are one
of the most important innovations in the Linux kernel. They provide a kernel
that is both scalable and dynamic. Discover the ideas behind loadable modules,
and learn how these independent objects dynamically become part of the Linux kernel.
"
Anatomy of Linux Journaling File Systems",
IBM developerWorks, June 2008.
In recent history, journaling file systems were viewed as an oddity and thought of
primarily in terms of research. But today, a journaling file system (ext3) is the
default in Linux�. Discover the ideas behind journaling file systems, and learn how
they provide better integrity in the face of a power failure or system crash. Learn
about the various journaling file systems in use today, and peek into the next
generation of journaling file systems.
"
Anatomy of
Linux Flash File Systems", IBM developerWorks, May 2008.
You've probably heard of Journaling Flash File System (JFFS) and Yet Another Flash File
System (YAFFS), but do you know what it means to have a file system that assumes an
underlying flash device? This article introduces you to flash file systems for Linux
explores how they care for their underlying consumable devices (flash parts) through wear
leveling, and identifies the various flash file systems available along with their fundamental
designs.
"
Anatomy of
Real-Time Linux Archictures", IBM developerWorks, April 2008.
It's not that Linux isn't fast or efficient, but in some cases fast just isn't good enough.
What's needed instead is the ability to deterministically meet scheduling deadlines with
specific tolerances. Discover the various real-time Linux alternatives and how they achieve
real time from the early architectures that mimic virtualization solutions to the options
available today in the standard 2.6 kernel.
"
Anatomy of Security-Enhanced Linux (SELinux), IBM developerWorks, April 2008.
Linux has been described as one of the most secure operating systems available, but the
National Security Agency (NSA) has taken Linux to the next level with the introduction
of Security-Enhanced Linux (SELinux). SELinux takes the existing GNU/Linux operating
system and extends it with kernel and user-space modifications to make it bullet-proof.
If you're running a 2.6 kernel today, you might be surprised to know that you're using
SELinux right now! This article explores the ideas behind SELinux and how it's implemented.
"
Explore Ubuntu Mobile and Embedded Edition", IBM developerWorks, January 2008.
Ubuntu Mobile and Embedded Edition (Part of the Ubuntu Gutsy Gibbon Linux Distribution)
is an environment designed to simplify integration with UMPC (ultra-mobile PC). Ubuntu
includes this support as part of its standard Linux environment. This tutorial introduces
you to UME, the tools provided within it (such as Moblin), and how to build a full
embedded environment for a supported UMPC.
"
Application Development for the OLPC Laptop", IBM developerWorks, December 2007.
This tutorial continues from an earlier article on the OLPC (now called the XO-1).
After providing an introduction to the XO-1 and its history, the tutorial shows you
how to build a simple graphical application in Python for the XO-1. This includes an
introduction to the basic interfaces and methods for integrating with the Sugar
graphical environment.
"
Anatomy of the Linux
SCSI Subsystem", IBM developerWorks, November 2007.
SCSI is a collection of standards that define how to communicate with a large number of devices
(mostly storage related). This article explores SCSI and how it's implemented within the Linux
kernel. It also introduces some of the acvances being made in SCSI such as SAS, FCoE and DIF.
"
Anatomy of
Linux Synchronization Methods", IBM developerWorks, October 2007.
This article explores the various synchronization methods available in the kernel (such as the
atomic operations, spinlocks, reader/writer locks, and kernel semaphores). It also discusses
concurrency and the reasons behind the need for the methods.
"
Anatomy of the
Linux file system", IBM developerWorks, October 2007.
This article explores filesystems within Linux, which is a another great example of abstraction.
For example, when you perform a 'read' operation, that could be to a ext2, ext3, JFFS or any
other type of filesystem, but on many different types of storage medium (ramdisk, USB flash
stick, SAS disk, etc.). The combinations are huge, but the Linux filesystem layer provides a
model abstraction for dealing various filesystems on various mediums. This article will
introduction the Linux filesystem layer and then explore the major structures and APIs that
implement this module.
"
System Emulation with QEMU",
IBM developerWorks, September 2007.
QEMU is a platform emulator which means that you can emulate an entire PC on another operating
system (such as Linux or Windows). This article introduces the ideas behind QEMU, discusses some
of its internals, and then demonstrates emulating another operating system on top of Linux.
"
Anatomy of
the Linux Networking Stack", IBM developerWorks, June 2007.
One of the greatest features of the Linux� operating system is its networking stack. It was
initially a derivative of the BSD stack and is well organized with a clean set of interfaces.
Its interfaces range from the protocol agnostics, such as the common sockets layer interface or
the device layer, to the specific interfaces of the individual networking protocols. This article
explores the structure of the Linux networking stack from the perspective of its layers and also
examines some of its major structures.
"
Anatomy of
the Linux Kernel", IBM developerWorks, June 2007.
The Linux� kernel is the core of a large and complex operating system, and while it's huge, it
is well organized in terms of subsystems and layers. In this article, you explore the general
structure of the Linux kernel and get to know its major subsystems and core interfaces.
"
Anatomy of the
Linux slab allocator", IBM developerWorks, May 2007.
An operating system commonly allocates and deallocates objects of
a fixed size. Additionally, these objects can be initialized to
a given structure. The slab allocator exploits the common size
object behavior of an operating system, and also makes it easy to
expand or shrink the memory requirements for a given object pool
very simple. The slab allocator originated in the SunOS, but now
finds its home in the Linux kernel.
"
Discover the Linux Kernel
Virtual Machine", IBM developerWorks, May 2007.
The newcomer to Linux virtualization is the Linux Kernel Virtual
Machine, or KVM. This modification to the Linux kernel converts
it into a Hypervisor, allowing it to host other operating systems
such as Linux and Windows. The Linux KVM requires a processor
with virtualization instructions, as can be found with the AMD
Pacifica or Intel Vt.
"
Sugar, the XO laptop,
and One Laptop per Child", IBM developerWorks, April 2007.
OLPC is the One-Laptop-per-Child initiative, and its goal is to
develop a $100 laptop for children around the world (now $150).
The laptop itself is very interesting, as the laptop must be
useful in different environments than our own. But what's most
interesting about the XO laptop is that it runs GNU/Linux and
is programmable using the Python language. This article explores
the XO laptop and shows you how to build a simple activity
(application) for a virtualized XO.
"
Virtualization
with coLinux", IBM DevelopWorks, April 2007.
Cooperation is probably the last thing that comes to mind when
considering Linux and MS Windows, but that's what you get with
coLinux. The coLinux is a cooperative Linux kernel that
virtualizes an entire Linux operating system on top of MS
Windows. You can get something similar wtih Cygwin, but coLinux
has some advantages.
"
Linux
and Symmetric Multiprocessing", IBM developerWorks, March 2007
Linux and SMP are two great tastes, that taste great together.
This article provides an introduction to multiprocessing (in particular
Chip-Level Multiprocessing, or CMP) and then discusses some of the SMP
features of the Linux kernel. It also briefly discusses how to
exploit SMP for user-space applications.
"
Parallelize
Applications for Faster Linux Booting", IBM developerWorks, March
2007.
Linux out of the box is a general solution for desktop and server
platforms. But booting Linux, especially if you're a developer
(particularly a kernel developer) can be a pain due to the time it
takes to complete. This article reviews two approaches of
parallelizing the boot process through init replacements. Initng
is a dependency-based solution, services are dependent upon
one-another, and once one service has started, other services that were
dependent upon that can start. Upstart is an event-based solution
to init. When a service starts, it can send events to kick-off
other services. Also explored in this article is bootchart, which
is used to visualize the Linux boot process.
"
Virtual
Linux", IBM developerworks, December 2006.
Linux virtualization has many solutions, from full virtualization,
para-virtualization, emulation, and many others. This article
explores the various methods that are available today for Linux
virtualization, including the new kid no the block, the Kernel Virtual
Machine (KVM). Read the comments at
Slashdot.
This article has been translated into
Russian
and
Korean.
"
Build
a Web Spider on Linux", IBM developerworks, November 2006.
The goal of this article is to explore the various methods for
developing web spiders on Linux. It illustrates spider
development using Python and Ruby. You can read the sordid
comments on
slashdot,
or
digg.
"
Data
Visualization with Linux", IBM developerworks, Nobember 2006.
Linux is a great platform for data manipulation and
visualization. From GNUPlot and Octave to Scilab and IBM's
OpenDX, this article covers that most useful with examples presented
for each.
"
Version
Control for Linux", IBM developerworks, October 2006.
One of the great aspects of Linux for developers is the wide range of
source configuration management (SCM) systems that are available.
From centralized to distributed repositories, and change-set versus
snapshot models, this article explores the major SCM architectures and
provides examples of each. Also available in
Japanese.
"
New to
IBM Systems", IBM developerworks, September 2006.
This article is fundamentally a marketing piece that introduces readers
to IBM servers.
"
Open
Source Robotics Toolkits", IBM developerworks, September 2006.
This article reviews a number of open-source toolkits for robotics
simulation and development. From the Open Dynamics Engine (ODE)
for modelling realistic physics, to TeamBots for modelling multi-agent
systems. Here's an
intro from
LinuxDevices, and a discussion at
robots.net.
"
Boost
Application Performance Using Asynchronous I/O", IBM
developerworks, August 2006.
Asynchronous I/O (or AIO) is a POSIX mechanism to increase performance
of overlapped I/O applications by providing callback mechanisms for I/O
completion. This article explores the variety of I/O models
available for Linux, and then digs into the AIO model with source
demonstration. The article is now a
reference on
Wikipedia.
"
BusyBox
simplifies embedded Linux Systems", IBM developerworks, August 2006.
BusyBox is the swiss army knife of Linux utilities. BusyBox is
interesting because it combines a large number of utilities into a
single binary, allowing them to share the underlying common code.
This makes it a perfect utility for embedded Linux systems.
Here's an
intro
at LinuxDevices.com. Also availabe in
Chinese.
"
Anatomy
of the Linux Initial Ramdisk (initrd)", IBM developerworks, July
2006.
The Initial Ramdisk (or initrd) is a temporary root filesystem in ram
that acts as an intermediary filesystem for module loading while the
real root filesystem is not yet available. This article explores
the anatomy of the initrd, and demonstrates how you can build one from
scratch. This article has been translated into
Japanese.
"
Inside
the Linux Scheduler", IBM developerworks Linux Zone, June 2006.
The Linux scheduler has evolved greatly over the years, and with the
2.6 kernel, has been transformed from an O(N) (linear time) scheduler
to an O(1) (constant time) scheduler. This article discusses the
new Linux 2.6 kernel and other aspects such as SMP support and load
balancing. Here's an
intro
from LinuxDevices. This article has been translated into
Japanese.
"
Inside
the Linux boot process", IBM developerworks Linux Zone, May 2006.
This Linux boot process is very flexible, and supports booting on a
large number of platforms from a variety of devices (hard disk, floppy,
CD-ROM, USB Flash, network, etc.). This article will walk you
through the desktop x86 boot process, but also provide some information
for embedded and network booting.
"
Access
the Linux Kernel using the /proc filesystem", IBM developerworks
Linux Zone, March 2006.
The /proc virtual filesystem is a great way to permit communication
(configuration and monitoring) between user-space applications and the
kernel. In this article you'll learn about /proc and
explore a demonstration of a fortune cookie dispenser implemented as a
kernel module with /proc. This article has been translated into
Chinese,
Japanese,
and also an introduction in
Korean.
"
Better
networking with the Stream Control Transmission Protocol (SCTP)",
IBM developerworks, February 2006.
In this article, I review the benefits of SCTP over TCP (from
multi-streaming to multi-homing). Sample code is also presented
demonstrating the multi-streaming feature. This article was also
slashdotted.
It has been translated into
Chinese.
"
Automate
Client Management with the Service Location Protocol (SLP)", IBM
developerworks, February 2006.
Discusses the zero-configuration networking capabilities of SLP and
demonstrates its use using a simple Daytime protocol example.
This article has been translated into
Chinese.
"
Boost
Socket Performance on Linux", IBM developerworks Linux Zone,
January 2006.
This article demonstrates four ways to boost the performance of sockets
applications, from socket buffer tuning to kernel proc filesystem
tuning. Also read the
sordid
Slashdot comments on this article.
"
Sockets
Programming in Ruby", IBM developerWorks Linux Zone, October 2005.
Tutorial exploring the Sockets API and its integration into the Ruby
object-oriented scripting language. Discusses Ruby-specific
features for sockets programming.
"
Sockets
Programming in Python", IBM developerWorks Linux Zone, October 2005.
Tutorial exploring the Sockets API and its integration into the Python
language. Discusses Python-specific features for sockets
programming.
"
Five
pitfalls of Linux sockets programming", IBM developerWorks Linux
Zone, September 2005.
Discusses the development of reliable networking applications in
heterogeneous environments. Translations are available for
Japanese
and also
Chinese.
"
Visualize
Function Calls withi Graphviz", IBM developerWorks Linux Zone, June
2005.
Using the GNU Compiler Toolchain, and a small amout of glue code, a
dynamic graphical function call generator can be easily created.
Translations to this article are available in
Japanese
and also
Chinese.
"GNU's C Language Extensions", C/C++ Users
Journal, March 2005.
The GNU Compiler includes a variety of language extensions. This
article explores some of the more useful elements.
"
Optimizing with GCC",
Linux
Journal, March 2005.
The GNU Compiler Collection (otherwise known as
GCC)
is the de facto standard compiler for Linux and also multi-platform
embedded development. This article discusses the 3.3 GCC
optimizer and how to use it effectively to build optimized applications.
"Defensive Programming", C/C++ Users
Journal, February 2005.
This article focuses on programming for reliability -- given that we
make mistakes in the development of software, how can we program in a
way that minimizes some common, and difficult to debug, mistakes.
"
GNU
Development", Circuit Cellar, January 2004.
This article provides a tour of software development with
GNU tools, including the GNU compiler
toolchain, build automation with make and a variety of other
utilities. This article is also available on
Developer::Pipelines.
"
An Embeddable
Lightweight XML-RPC Server", Dr. Dobb's Journal, June 2003.
The
XML-RPC protocol is
explored in this article, with a simple implementation of a server in
C. The server is then demonstrated using a C and
Python client.
"
Personalization
and
Adaptive Resonance Theory", Dr. Dobb's Journal, October 2002.
This article discusses the use of the ART1 clustering algorithm for
personalization (recommendation).
"
Java Mobile
Agents
and the Aglets SDK", Dr. Dobb's Journal, January 2002.
Demonstrates the construction of simple mobile agents (migratory
programs) in Java using
IBM's Aglets SDK.
"
Embed with
the Mailman", Embedded Systems Programming, October 2001.
An
SMTP server
and client is discussed with source code suitable for use in embedded
systems. The
SMTP client
is discussed in applications of remote statusing (emitting data to a
remote client). The
SMTP server
is discussed from the perspective of command and control (sending
emails to the embedded device with commands, and receiving back
responses from the onboard
SMTP client).
"
An
Embeddable HTTP Server", Dr. Dobb's Journal, October 2001.
An embeddable
HTTP server
is presented that is suitable not only for embedded systems, but those
without file systems (EEPROM-based). The concept of an
application filesystem is presented along with the tools to build and
integrate it with the
HTTP server.
"
Embedded
Linux on the PowerPC", Embedded Linux Journal, July 2001.
In this article, I explore the use of Linux (
Montavista Linux) on the
Embedded Planet RPX-Net
PowerPC board.
"
CPJazz
-- A Software Framework for Vehicle Systems Integration and
Wireless Connectivity"
SAE 2000 World Congress. Also appears in the book "
Intelligent
Vehicle Systems", ISBN 0-7680-0588-4.
Discusses research in connectivity between disparate devices and
vehicle buses to disparate wireless assets in a vehicle
environment. The flexible CPJazz architecture provides a
"software bus" architecture to seamlessly integrate buses and devices
for intercommunication.
Presentations
"
High Performance Networking",
May 2003
I gave this presentation as a guest lecture in Sam Siewert's "Real-Time
Embedded Systems" course at Colorado University in the Spring of
2003. In this presentation, I discussed problems and solutions
for scaling TCP/IP networking to gigabit networks through a variety of
means.
mtj@mtjones.com.
Last Updated December 2011.