M. Tim Jones Publications

Articles and Papers

Below is a selection of papers that I've written (with links when available online).

"Scheduling in Hadoop" IBM developerWorks, December 2011
Hadoop implements the ability for pluggable schedulers that assign resources to jobs. However, as we know from traditional scheduling, not all algorithms are the same, and efficiency is workload and cluster dependent. Get to know Hadoop scheduling, and explore two of the algorithms available today: fair scheduling and capacity scheduling. Also, learn how these algorithms are tuned and in what scenarios they're relevant.

"Spark, An Alternative for Fast Data Analytics" IBM developerWorks, November 2011
Although Hadoop captures the most attention for distributed data analytics, there are alternatives that provide some interesting advantages to the typical Hadoop platform. Spark is a scalable data analytics platform that incorporates primitives for in-memory computing and therefore exercises some performance advantages over Hadoop's cluster storage approach. Spark is implemented in and exploits the Scala language, which provides a unique environment for data processing. Get to know the Spark approach for cluster computing and its differences from Hadoop.

"Practical Approaches to Cloud-based High Availability" IBM developerWorks, October 2011
Although high availability (HA) is a complex activity, regardless of the application, clouds and their virtualization platforms actually make this objective simpler and more straightforward. Virtualization as an abstraction from the physical platform creates a new opportunity for HA. In this article, explore some of the practical approaches to cloud-based HA, including stateless failover and the more useful stateful failover. Also, discover the various open source software components at play in HA systems.

"Data Mining with Ruby and Twitter" IBM developerWorks, October 2011
Twitter is not only a fantastic real-time social networking tool, it's also a source of rich information that's ripe for data mining. On average, Twitter users generate 140 million tweets per day on a variety of topics. This article introduces you to data mining and demonstrates the concept with the object-oriented Ruby language.

"Open Source Physics Engines" IBM developerWorks, July 2011
Graphics give games a visual appeal, but it's the internal physics engine that gives the game's world life. A physics engine is a software component that provides a simulation of a physical system. This simulation can include soft- and rigid-body dynamics, fluid dynamics, and collision detection. The open source community has a number of useful physics engines operating in the 2D and 3D domains targeted to games and simulations. This article introduces the use and basics of a physics engine and explores two options that exist: Box2D and Bullet.

"Ceylon: True advance, or just another language?" IBM developerWorks, June 2011
The language road in computer science is littered with the carcasses of what was to be "the next big thing." And although many niche languages do find some adoption in scripting or specialized applications, C (and its derivatives) and the Java language are difficult to displace. But Red Hat's Ceylon appears to be an interesting combination of language features, using a well-known C-style syntax but with support for object orientation and useful functional aspects in addition to an emphasis on being succinct. Explore Ceylon and find out if this future VM language can find a place in enterprise software development.

"Application Virtualization, past and future" IBM developerWorks, May 2011
When you hear the phrase virtual machine today, you probably think of virtualization and hypervisors. But VMs are simply an older concept of abstraction, a common method of abstracting one entity from another. This article explores two of the many newer open source VM technologies: Dalvik (the VM core of the Android operating system) and Parrot (an open source VM technology for efficiently executing dynamic languages).

"Virtualization and Embedded Systems" IBM developerWorks, April 2011
Today's technical news is filled with stories of server and desktop virtualization, but there's another virtualization technology that's growing rapidly: embedded virtualization. The embedded domain has several useful applications for virtualization, including mobile handsets, security kernels, and concurrent embedded operating systems. This article explores the area of embedded virtualization and explains why it's coming to an embedded system near you.

"Linux and the Storage Ecosystem" IBM developerWorks, March 2011
Linux is the Swiss Army knife of file systems, and it also offers a wide variety of storage technologies for both desktops and servers. Beyond the file system, Linux incorporates world-class NAS and SAN technologies, data protection, storage management, support for clouds, and solid-state storage. Learn more about the Linux storage ecosystem and why it's number one in server market share.

"Emulation and Computing History" IBM developerWorks, March 2011
The simplest computing devices we use today have more processing capability than the most capable computing systems of yesterday. For example, the VAX 11/780 delivered around 0.5 MIPS in the early 1980s. Compare that to an IBM zEnterprise 196 (z196) mainframe of today, which can support well over 52 KMIPS. However, we can learn a lot from early computing history. If you've ever wanted to boot an IBM 1130, PDP-11, or MITS Altair, then the Computer History Simulation Project is just what you've been looking for.

"Linux Scheduler Simulation" IBM developerWorks, February 2011 Scheduling is one of the most complex�and interesting�aspects of the Linux kernel. Developing schedulers that provide suitable behavior for single-core machines to quad-core servers can be difficult. Luckily, the Linux Scheduler Simulator (LinSched) hosts your Linux scheduler in user space (for scheduler prototyping) while modeling arbitrary hardware targets to validate your scheduler across a spectrum of topologies. Learn about LinSched and how to experiment with your scheduler for Linux.

"Data Visualization with Processing, Part 3" IBM developerWorks, February 2011 This final article in the "Data visualization with Processing" series explores some of Processing's more advanced features, starting with an introduction to 2-D and 3-D graphics and lighting features. Then explore physics applications with graphical visualization, learn about Processing's networking features, and develop a simple application that visualizes data from the Internet.

"Platform Emulation with Bochs" IBM developerWorks, January 2011
Bochs, like QEMU, is a portable emulator that provides a virtualization environment in which to run an operating system using an emulated platform in the context of another operating system. Bochs isn't a hypervisor but rather a PC-compatible emulator useful for legacy software. Learn about platform emulation using Bochs and its approach to hardware emulation.

"Run ZFS on Linux" IBM developerWorks, January 2011
Although ZFS exists in an operating system whose future is at risk, it is easily one of the most advanced, feature-rich file systems in existence. It incorporates variable block sizes, compression, encryption, de-duplication, snapshots, clones, and (as the name implies) support for massive capacities. Get to know the concepts behind ZFS and learn how you can use ZFS today on Linux using Filesystem in Userspace (FUSE).

"Data Visualization wtih Processing, Part 2: Intermediate data visualization using interfaces, objects, images, and applications" IBM developerWorks, January 2011
Part 1 of this "Data visualization with Processing" series introduces the Processing language and development environment and demonstrated the language's basic graphical capabilities. This second article explores Processing's more advanced features, including UIs and object-oriented programming. Learn about image processing and how to convert your Processing application into a Java applet suitable for the web, and explore an optimization algorithm that lends itself well to visualization.

"Anatomy of a Cloud Storage Infrastructure" IBM developerWorks, November 2010
Cloud storage (or data storage as a service) is the abstraction of storage behind an interface where the storage can be administered on demand. Further, the interface abstracts the location of the storage such that it is irrelevant whether the storage is local or remote (or hybrid). Cloud storage infrastructures introduce new architectures that support varying levels of service over a potentially large set of users and geographically distributed storage capacity. Learn about the key architectural attributes of cloud storage architectures�from data protection and integrity to security and storage optimization.

"Data Visualization with Processing, Part 1: An introduction to the language and environment" IBM developerWorks, November 2010
Building graphical applications and applications that present complex data can be difficult. Although many graphical libraries exist, they cater to advanced users or present non-trivial APIs. The Processing language and environment solves this problem by creating a portable environment and language for graphical presentation. Processing makes it simple to build applications that present static data, dynamic data (such as animations), or interactive data. This first article in the series explores building applications for visualization in particular, simulations for life sciences.

"Network Filesystems and Linux" IBM developerWorks, November 2010
Network File System (NFS) has been around since 1984, but it continues to evolve and provide the basis for distributed file systems. Today, NFS (through the pNFS extension) provides scalable access to files distributed across a network. Explore the ideas behind distributed file systems and in particular, recent advances in NFS.

"Virtual networking in Linux" IBM developerWorks, October 2010
With the explosive growth of platform virtualization, it's not surprising that other parts of the enterprise ecosystem are being virtualized, as well. One of the more recent areas is virtual networking. Early implementations of platform virtualization created virtual NICs, but today, larger portions of the network are being virtualized, such as switches that support communication among VMs on a server or distributed among servers. Explore the ideas behind virtual networking, with a focus on NIC and switch virtualization.

"Kernel Logging: APIs and Implementation" IBM developerWorks, September 2010
In kernel development, we use printk for logging without much thought. But have you considered the process and underlying implementation of kernel logging? Explore the entire process of kernel logging, from printk to insertion into the user space log file.

"Cloud Computing" Datamation, September 2010
Cloud computing gets enormous attention from tech professionals. And cloud computing justifies all the focus. Not that it�s a simple idea: It's an amorphous term that is stretched and molded to fit a variety of perspectives. It will take time to fully understand what cloud computing is and isn�t. In this article, we�ll explore cloud computing from the computing and storage angle, and review some of the most important offerings in the various cloud models.

"User Space Memory Access from the Linux Kernel" IBM developerWorks, August 2010
As the kernel and user space exist in different virtual address spaces, there are special considerations for moving data between them. Explore the ideas behind virtual address spaces and the kernel APIs for data movement to and from user space, and learn some of the other mapping techniques used to map memory.

"High Availability with the Distributed Replicated Block Device" IBM developerWorks, August 2010
The 2.6.33 Linux kernel has introduced a useful new service called the Distributed Replicated Block Device (DRBD). This service mirrors an entire block device to another networked host during run time, permitting the development of high-availability clusters for block data. Explore the ideas behind the DRBD and its implementation in the Linux kernel.

" "Distributed data processing with Hadoop, Part 3: Application development" IBM developerWorks, July 2010
With configuration, installation, and the use of Hadoop in single- and multi-node architectures under your belt, you can now turn to the task of developing applications within the Hadoop infrastructure. This final article in the series explores the Hadoop APIs and data flow and demonstrates their use with a simple mapper and reducer application.

"Data Migration: EMC, Compellent, 3PAR, FalconStor" Datamation, July 2010
As is the case for most technology domains, change is the only constant. The storage ecosystem is a great example, where change is not only occurring, but at all levels from the individual storage devices, to the baseline services and front-end protocols that are used to manipulate our growing masses of data. In this article we'll explore one of those services and along the way touch on many of the related evolutions and revolutions that are happening today.

"Distributed data processing with Hadoop, part 2: Going further" IBM developerWorks, June 2010
The first article in this series showed how to use Hadoop in a single-node cluster. This article continues with a more advanced setup that uses multiple nodes for parallel processing. It demonstrates the various node types required for multinode clusters and explores MapReduce functionality in a parallel environment. This article also digs into the management aspects of Hadoop both command line and Web based.

"Virtualization" Datamation, May 2010
When you read about virtualization today, much of the content focuses on the relatively new concept of server virtualization. In this context, multiple operating system and application sets are virtualized on a single server, allowing it to be more efficiently and cost effectively used. While this drives much of the innovation (and revenue) around virtualization today, there are a multitude of virtualization schemes addressing a spectrum of applications. In this article, we'll explore many of the ideas around virtualization and identify their uses and advantages.

"Distributed data processing with Hadoop, part 1: Getting Started" IBM developerWorks, May 2010
This article�the first in a series on Hadoop�explores the Hadoop framework, including its fundamental elements, such as the Hadoop file system (HDFS), and node types that are commonly used. Learn how to install and configure a single-node Hadoop cluster, and delve into the MapReduce application. Finally, discover ways to monitor and manage Hadoop using its core Web interfaces.

"Ceph: A Linux Petabyte-scale Distributed File System" IBM developerWorks, May 2010
Linux continues to invade the scalable computing space and, in particular, the scalable storage space. A recent addition to Linux's impressive selection of file systems is Ceph, a distributed file system that incorporates replication and fault tolerance while maintaining POSIX compatibility. Explore the architecture of Ceph and learn how it provides fault tolerance and simplifies the management of massive amounts of data.

"Virtual Linux: Platform and OS Linux Virtualization" Datamation, May 2010
Virtual Linux is accomplished through many techniques, ranging from emulation to platform to OS virtualization. Indeed, Linux is a unique operating system in its breadth of virtualization solutions that are available. In this article, we'll explore the various ways that virtualization is achieved and then review the various solutions provided through virtual Linux.

"Anatomy of Linux Kernel Shared Memory" IBM developerWorks, April 2010
Linux as a hypervisor includes a number of innovations, and one of the more interesting changes in the 2.6.32 kernel is Kernel Shared Memory (KSM). KSM allows the hypervisor to increase the number of concurrent virtual machines by consolidating identical memory pages. Explore the ideas behind KSM (such as storage de-duplication), its implementation, and how you manage it.

"Kernel APIs, Part 3: Timers and Lists in the 2.6 Kernel" IBM developerWorks, March 2010
The Linux kernel includes a variety of APIs intended to help developers build simpler and more efficient driver and kernel applications. Two of the more common APIs that can be used for work deferral are the list management and timer APIs. Discover these APIs, and learn how to develop kernel applications with timers and lists.

"Anatomy of an Open Source Cloud" IBM developerWorks, March 2010
Cloud computing is no longer a technology on the cusp of breaking out but a valuable and important technology that is fundamentally changing the way we use and develop applications. As you would expect, Linux� and open source provide the foundation for the cloud (for public and private infrastructures). Explore the anatomy of the cloud, its architecture, and the open source technologies used to build these dynamic and scalable computing and storage platforms.

"Deferrable Functions, Kernel Tasklets, and Work Queues" IBM developerWorks, March 2010
For high-frequency threaded operations, the Linux kernel provides tasklets and work queues. Tasklets and work queues implement deferrable functionality and replace the older bottom-half mechanism for drivers. This article explores the use of tasklets and work queues in the kernel and shows you how to build deferrable functions with these APIs.

"Invoking User-Space Applications from the Kernel" IBM developerWorks, February 2010
The Linux system call interface permits user-space applications to invoke functionality in the kernel, but what about invoking user-space applications from the kernel? Explore the usermode-helper API, and learn how to invoke user-space applications and manipulate their output.

"Virtio: An I/O Virtualization Framework for Linux" IBM developerWorks, January 2010
The Linux kernel supports a variety of virtualization schemes, and that's likely to grow as virtualization advances and new schemes are discovered (for example, lguest). But with all these virtualization schemes running on top of Linux, how do they exploit the underlying kernel for I/O virtualization? The answer is virtio, which provides an efficient abstraction for hypervisors and a common set of I/O virtualization drivers. Discover virtio, and learn why Linux will soon be the hypervisor of choice.

"Anatomy of the libvirt Virtualization Library" IBM developerWorks, January 2010
The libvirt library is a Linux API over the virtualization capabilities of Linux that supports a variety of hypervisors, including Xen and KVM, as well as QEMU and some virtualization products for other operating systems. This article explores libvirt, its use, and its architecture.

"Inside the Linux 2.6 Completely Fair Scheduler" IBM Developerworks, December 2009
The task scheduler is a key part of any operating system, and Linux� continues to evolve and innovate in this area. In kernel 2.6.23, the Completely Fair Scheduler (CFS) was introduced. This scheduler, instead of relying on run queues, uses a red-black tree implementation for task management. Explore the ideas behind CFS, its implementation, and advantages over the prior O(1) scheduler.

"Linux introspection and SystemTap" IBM Developerworks, November 2009
Modern operating system kernels provide the means for introspection, the ability to peer dynamically within the kernel to understand its behaviors. These behaviors can indicate problems in the kernel as well as performance bottlenecks. With this knowledge, you can tune or modify the kernel to avoid failure conditions. Discover an open source infrastructure called SystemTap that provides this dynamic introspection for the Linux� kernel.

"Next-generation Linux file systems: NiLFS(2) and exofs" IBM Developerworks, November 2009
Linux� continues to innovate in the area of file systems. It supports the largest variety of file systems of any operating system. It also provides cutting-edge file system technology. Two new file systems that are making their way into Linux include the NiLFS(2) log-structured file system and the exofs object-based storage system. Discover the purpose behind these two new file systems and the advantages that they bring.

"Virtual appliances and the Open Virtualization Format" IBM Developerworks, October 2009
Not only has virtualization advanced the state of the art in maximizing server efficiency, it has also opened the door to new technologies that were not possible before. One of these technologies is the virtual appliance, which fundamentally changes the way software is delivered, configured, and managed. But the power behind virtual appliances lies in the ability to freely share them among different hypervisors. Learn the ideas and benefits behind virtual appliances, and discover a standard solution for virtual appliance interoperability called the Open Virtualization Format.

"Linux Virtualization and PCI Passthrough" IBM Developerworks, October 2009
Processors have evolved to improve performance for virtualized environments, but what about I/O aspects? Discover one such I/O performance enhancement called device (or PCI) passthrough. This innovation improves performance of PCI devices using hardware support from Intel (VT-d) or AMD (IOMMU).

"Meet the Extensible Messaging and Presence Protocol (XMPP)" IBM Developerworks, September 2009
XMPP is a open protocol for XML-based communication over the Internet. Although it is most popular as an instant-messaging protocol, you can use it as a general messaging service, as well. Discover the ins and outs of XMPP, and learn how to use it for simple messaging.

"Conversing through the Internet with cURL and libcurl" IBM Developerworks, September 2009
cURL is a command-line tool that speaks a number of protocols for file transfer, including HTTP, FTP, Secure Copy (SCP), Telnet, and others. But in addition to conversing with endpoints over the Internet from the command line, you can also write simple to complex programs using libcurl to automate application-layer protocol tasks. This article introduces the cURL command-line tool, then shows you how to build an HTTP client in C and Python using libcurl.

"Anatomy of the Linux virtual file system switch" IBM Developerworks, August 2009
Linux is the very definition of flexibility and extensibility. Take the virtual file system switch (VFS). You can create file systems on a variety of devices, from traditional disk, USB flash drives, memory, and other storage devices. You can even embed a file system within the context of another file system. Discover what makes the VFS so powerful, and learn its major interfaces and processes.

"The Blue Programming Language" IBM Developerworks, August 2009
Languages are the means by which we express our desires to computers systems, and, as far as I'm concerned, there's no such thing as too many. One unique language, called Blue, is an open source object-oriented language that is multipurpose and intuitive to use. This tip provides the foundation for Blue and shows you how to build simple networking applications.

"Anatomy of a Linux Hypervisor" IBM Developerworks, May 2009
One of the most important modern innovations of Linux� is its transformation into a hypervisor (or, an operating system for other operating systems). A number of hypervisor solutions have appeared that use Linux as the core. This article explores the ideas behind the hypervisor and two particular hypervisors that use Linux as the platform (KVM and Lguest).

"Linux Kernel Advances"
Life's certainties include death and taxes but also the advancement of the GNU/Linux� operating system, and the last two kernel releases did not disappoint. The 2.6.28 and 2.6.29 releases contain an amazing amount of new functionality, such as a cutting-edge enterprise storage protocol, two new file systems, WiMAX broadband networking support, and storage integrity checking. Discover why it's time to upgrade.

"Anatomy of ext4" IBM DeveloperWorks, February 2009
The fourth extended file system, or ext4, is the next generation of journaling file systems, retaining backward compatibility with the previous file system, ext3. Although ext4 is not currently the standard, it will be the next default file system for most Linux� distributions. Get to know ext4, and discover why it will be your new favorite file system.

"GCC Hacks in the Linux Kernel" IBM DeveloperWorks, November 2008.
The Linux kernel uses several special capabilities of the GNU Compiler Collection (GCC) suite. These capabilities range from giving you shortcuts and simplifications to providing the compiler with hints for optimization. Discover some of these special GCC features and learn how to use them in the Linux kernel.

"Get to know GCC4" IBM DeveloperWorks, October 2008.
In the last few years, the GNU Compiler Collection (GCC) has undergone a major transition from GCC version 3 to version 4. With GCC 4 comes a new optimization framework (and new intermediate code representation), new target and language support, and a variety of new attributes and options. Get to know the major new features and their benefits.

"Cloud Computing with Linux" IBM DeveloperWorks, September 2008.
Cloud computing and storage convert physical resources (like processors and storage) into scalable and shareable resources over the Internet (computing and storage "as a service"). Although not a new concept, virtualization makes this much more scalable and efficient through the sharing of physical systems through server virtualization. Cloud computing gives users access to massive computing and storage resources without their having to know where those resources are or how they're configured. As you might expect, Linux plays a huge role. Discover cloud computing, and learn why there's a penguin behind that silver lining.

"Anatomy of Linux Dynamic Libraries" IBM DeveloperWorks, August 2008.
Dynamically linked shared libraries are an important aspect of GNU/Linux. They allow executables to dynamically access external functionality at run time and thereby reduce their overall memory footprint (by bringing functionality in when it's needed). This article investigates the process of creating and using dynamic libraries, provides details on the various tools for exploring them, and explores how these libraries work under the hood.

"Anatomy of Linux Loadable Kernel Modules" IBM DeveloperWorks, July 2008.
Linux loadable kernel modules, introduced in version 1.2 of the kernel, are one of the most important innovations in the Linux kernel. They provide a kernel that is both scalable and dynamic. Discover the ideas behind loadable modules, and learn how these independent objects dynamically become part of the Linux kernel.

"Anatomy of Linux Journaling File Systems", IBM developerWorks, June 2008.
In recent history, journaling file systems were viewed as an oddity and thought of primarily in terms of research. But today, a journaling file system (ext3) is the default in Linux�. Discover the ideas behind journaling file systems, and learn how they provide better integrity in the face of a power failure or system crash. Learn about the various journaling file systems in use today, and peek into the next generation of journaling file systems.

"Anatomy of Linux Flash File Systems", IBM developerWorks, May 2008.
You've probably heard of Journaling Flash File System (JFFS) and Yet Another Flash File System (YAFFS), but do you know what it means to have a file system that assumes an underlying flash device? This article introduces you to flash file systems for Linux explores how they care for their underlying consumable devices (flash parts) through wear leveling, and identifies the various flash file systems available along with their fundamental designs.

"Anatomy of Real-Time Linux Archictures", IBM developerWorks, April 2008.
It's not that Linux isn't fast or efficient, but in some cases fast just isn't good enough. What's needed instead is the ability to deterministically meet scheduling deadlines with specific tolerances. Discover the various real-time Linux alternatives and how they achieve real time from the early architectures that mimic virtualization solutions to the options available today in the standard 2.6 kernel.

" Anatomy of Security-Enhanced Linux (SELinux), IBM developerWorks, April 2008.
Linux has been described as one of the most secure operating systems available, but the National Security Agency (NSA) has taken Linux to the next level with the introduction of Security-Enhanced Linux (SELinux). SELinux takes the existing GNU/Linux operating system and extends it with kernel and user-space modifications to make it bullet-proof. If you're running a 2.6 kernel today, you might be surprised to know that you're using SELinux right now! This article explores the ideas behind SELinux and how it's implemented.

" Explore Ubuntu Mobile and Embedded Edition", IBM developerWorks, January 2008.
Ubuntu Mobile and Embedded Edition (Part of the Ubuntu Gutsy Gibbon Linux Distribution) is an environment designed to simplify integration with UMPC (ultra-mobile PC). Ubuntu includes this support as part of its standard Linux environment. This tutorial introduces you to UME, the tools provided within it (such as Moblin), and how to build a full embedded environment for a supported UMPC.

" Application Development for the OLPC Laptop", IBM developerWorks, December 2007.
This tutorial continues from an earlier article on the OLPC (now called the XO-1). After providing an introduction to the XO-1 and its history, the tutorial shows you how to build a simple graphical application in Python for the XO-1. This includes an introduction to the basic interfaces and methods for integrating with the Sugar graphical environment.

"Anatomy of the Linux SCSI Subsystem", IBM developerWorks, November 2007.
SCSI is a collection of standards that define how to communicate with a large number of devices (mostly storage related). This article explores SCSI and how it's implemented within the Linux kernel. It also introduces some of the acvances being made in SCSI such as SAS, FCoE and DIF.

"Anatomy of Linux Synchronization Methods", IBM developerWorks, October 2007.
This article explores the various synchronization methods available in the kernel (such as the atomic operations, spinlocks, reader/writer locks, and kernel semaphores). It also discusses concurrency and the reasons behind the need for the methods.

"Anatomy of the Linux file system", IBM developerWorks, October 2007.
This article explores filesystems within Linux, which is a another great example of abstraction. For example, when you perform a 'read' operation, that could be to a ext2, ext3, JFFS or any other type of filesystem, but on many different types of storage medium (ramdisk, USB flash stick, SAS disk, etc.). The combinations are huge, but the Linux filesystem layer provides a model abstraction for dealing various filesystems on various mediums. This article will introduction the Linux filesystem layer and then explore the major structures and APIs that implement this module.

" System Emulation with QEMU", IBM developerWorks, September 2007.
QEMU is a platform emulator which means that you can emulate an entire PC on another operating system (such as Linux or Windows). This article introduces the ideas behind QEMU, discusses some of its internals, and then demonstrates emulating another operating system on top of Linux.

"Anatomy of the Linux Networking Stack", IBM developerWorks, June 2007.
One of the greatest features of the Linux� operating system is its networking stack. It was initially a derivative of the BSD stack and is well organized with a clean set of interfaces. Its interfaces range from the protocol agnostics, such as the common sockets layer interface or the device layer, to the specific interfaces of the individual networking protocols. This article explores the structure of the Linux networking stack from the perspective of its layers and also examines some of its major structures.

"Anatomy of the Linux Kernel", IBM developerWorks, June 2007.
The Linux� kernel is the core of a large and complex operating system, and while it's huge, it is well organized in terms of subsystems and layers. In this article, you explore the general structure of the Linux kernel and get to know its major subsystems and core interfaces.

"Anatomy of the Linux slab allocator", IBM developerWorks, May 2007.
An operating system commonly allocates and deallocates objects of a fixed size. Additionally, these objects can be initialized to a given structure. The slab allocator exploits the common size object behavior of an operating system, and also makes it easy to expand or shrink the memory requirements for a given object pool very simple. The slab allocator originated in the SunOS, but now finds its home in the Linux kernel.

"Discover the Linux Kernel Virtual Machine", IBM developerWorks, May 2007.
The newcomer to Linux virtualization is the Linux Kernel Virtual Machine, or KVM. This modification to the Linux kernel converts it into a Hypervisor, allowing it to host other operating systems such as Linux and Windows. The Linux KVM requires a processor with virtualization instructions, as can be found with the AMD Pacifica or Intel Vt.

"Sugar, the XO laptop, and One Laptop per Child", IBM developerWorks, April 2007.
OLPC is the One-Laptop-per-Child initiative, and its goal is to develop a $100 laptop for children around the world (now $150). The laptop itself is very interesting, as the laptop must be useful in different environments than our own. But what's most interesting about the XO laptop is that it runs GNU/Linux and is programmable using the Python language. This article explores the XO laptop and shows you how to build a simple activity (application) for a virtualized XO.

"Virtualization with coLinux", IBM DevelopWorks, April 2007.
Cooperation is probably the last thing that comes to mind when considering Linux and MS Windows, but that's what you get with coLinux. The coLinux is a cooperative Linux kernel that virtualizes an entire Linux operating system on top of MS Windows. You can get something similar wtih Cygwin, but coLinux has some advantages.

"Linux and Symmetric Multiprocessing", IBM developerWorks, March 2007
Linux and SMP are two great tastes, that taste great together. This article provides an introduction to multiprocessing (in particular Chip-Level Multiprocessing, or CMP) and then discusses some of the SMP features of the Linux kernel. It also briefly discusses how to exploit SMP for user-space applications.

"Parallelize Applications for Faster Linux Booting", IBM developerWorks, March 2007.
Linux out of the box is a general solution for desktop and server platforms. But booting Linux, especially if you're a developer (particularly a kernel developer) can be a pain due to the time it takes to complete. This article reviews two approaches of parallelizing the boot process through init replacements. Initng is a dependency-based solution, services are dependent upon one-another, and once one service has started, other services that were dependent upon that can start. Upstart is an event-based solution to init. When a service starts, it can send events to kick-off other services. Also explored in this article is bootchart, which is used to visualize the Linux boot process.

"Virtual Linux", IBM developerworks, December 2006.
Linux virtualization has many solutions, from full virtualization, para-virtualization, emulation, and many others. This article explores the various methods that are available today for Linux virtualization, including the new kid no the block, the Kernel Virtual Machine (KVM). Read the comments at Slashdot. This article has been translated into Russian and Korean.

"Build a Web Spider on Linux", IBM developerworks, November 2006.
The goal of this article is to explore the various methods for developing web spiders on Linux. It illustrates spider development using Python and Ruby. You can read the sordid comments on slashdot, or digg.

"Data Visualization with Linux", IBM developerworks, Nobember 2006.
Linux is a great platform for data manipulation and visualization. From GNUPlot and Octave to Scilab and IBM's OpenDX, this article covers that most useful with examples presented for each.

"Version Control for Linux", IBM developerworks, October 2006.
One of the great aspects of Linux for developers is the wide range of source configuration management (SCM) systems that are available. From centralized to distributed repositories, and change-set versus snapshot models, this article explores the major SCM architectures and provides examples of each. Also available in Japanese.

"New to IBM Systems", IBM developerworks, September 2006.
This article is fundamentally a marketing piece that introduces readers to IBM servers.

"Open Source Robotics Toolkits", IBM developerworks, September 2006.
This article reviews a number of open-source toolkits for robotics simulation and development. From the Open Dynamics Engine (ODE) for modelling realistic physics, to TeamBots for modelling multi-agent systems. Here's an intro from LinuxDevices, and a discussion at robots.net.

"Boost Application Performance Using Asynchronous I/O", IBM developerworks, August 2006.
Asynchronous I/O (or AIO) is a POSIX mechanism to increase performance of overlapped I/O applications by providing callback mechanisms for I/O completion. This article explores the variety of I/O models available for Linux, and then digs into the AIO model with source demonstration. The article is now a reference on Wikipedia.

"BusyBox simplifies embedded Linux Systems", IBM developerworks, August 2006.
BusyBox is the swiss army knife of Linux utilities. BusyBox is interesting because it combines a large number of utilities into a single binary, allowing them to share the underlying common code. This makes it a perfect utility for embedded Linux systems. Here's an intro at LinuxDevices.com. Also availabe in Chinese.

"Anatomy of the Linux Initial Ramdisk (initrd)", IBM developerworks, July 2006.
The Initial Ramdisk (or initrd) is a temporary root filesystem in ram that acts as an intermediary filesystem for module loading while the real root filesystem is not yet available. This article explores the anatomy of the initrd, and demonstrates how you can build one from scratch. This article has been translated into Japanese.

"Inside the Linux Scheduler", IBM developerworks Linux Zone, June 2006.
The Linux scheduler has evolved greatly over the years, and with the 2.6 kernel, has been transformed from an O(N) (linear time) scheduler to an O(1) (constant time) scheduler. This article discusses the new Linux 2.6 kernel and other aspects such as SMP support and load balancing. Here's an intro from LinuxDevices. This article has been translated into Japanese.

"Inside the Linux boot process", IBM developerworks Linux Zone, May 2006.
This Linux boot process is very flexible, and supports booting on a large number of platforms from a variety of devices (hard disk, floppy, CD-ROM, USB Flash, network, etc.). This article will walk you through the desktop x86 boot process, but also provide some information for embedded and network booting.

"Access the Linux Kernel using the /proc filesystem", IBM developerworks Linux Zone, March 2006.
The /proc virtual filesystem is a great way to permit communication (configuration and monitoring) between user-space applications and the kernel. In this article you'll learn about /proc and explore a demonstration of a fortune cookie dispenser implemented as a kernel module with /proc. This article has been translated into Chinese, Japanese, and also an introduction in Korean.

"Better networking with the Stream Control Transmission Protocol (SCTP)", IBM developerworks, February 2006.
In this article, I review the benefits of SCTP over TCP (from multi-streaming to multi-homing). Sample code is also presented demonstrating the multi-streaming feature. This article was also slashdotted. It has been translated into Chinese.

"Automate Client Management with the Service Location Protocol (SLP)", IBM developerworks, February 2006.
Discusses the zero-configuration networking capabilities of SLP and demonstrates its use using a simple Daytime protocol example. This article has been translated into Chinese.

"Boost Socket Performance on Linux", IBM developerworks Linux Zone, January 2006.
This article demonstrates four ways to boost the performance of sockets applications, from socket buffer tuning to kernel proc filesystem tuning. Also read the sordid Slashdot comments on this article.

"Sockets Programming in Ruby", IBM developerWorks Linux Zone, October 2005.
Tutorial exploring the Sockets API and its integration into the Ruby object-oriented scripting language. Discusses Ruby-specific features for sockets programming.

"Sockets Programming in Python", IBM developerWorks Linux Zone, October 2005.
Tutorial exploring the Sockets API and its integration into the Python language. Discusses Python-specific features for sockets programming.

"Five pitfalls of Linux sockets programming", IBM developerWorks Linux Zone, September 2005.
Discusses the development of reliable networking applications in heterogeneous environments. Translations are available for Japanese and also Chinese.

"Visualize Function Calls withi Graphviz", IBM developerWorks Linux Zone, June 2005.
Using the GNU Compiler Toolchain, and a small amout of glue code, a dynamic graphical function call generator can be easily created. Translations to this article are available in Japanese and also Chinese.

"GNU's C Language Extensions", C/C++ Users Journal, March 2005.
The GNU Compiler includes a variety of language extensions. This article explores some of the more useful elements.

"Optimizing with GCC", Linux Journal, March 2005.
The GNU Compiler Collection (otherwise known as GCC) is the de facto standard compiler for Linux and also multi-platform embedded development. This article discusses the 3.3 GCC optimizer and how to use it effectively to build optimized applications.

"Defensive Programming", C/C++ Users Journal, February 2005.
This article focuses on programming for reliability -- given that we make mistakes in the development of software, how can we program in a way that minimizes some common, and difficult to debug, mistakes.

"GNU Development", Circuit Cellar, January 2004.
This article provides a tour of software development with GNU tools, including the GNU compiler toolchain, build automation with make and a variety of other utilities. This article is also available on Developer::Pipelines.

"An Embeddable Lightweight XML-RPC Server", Dr. Dobb's Journal, June 2003.
The XML-RPC protocol is explored in this article, with a simple implementation of a server in C. The server is then demonstrated using a C and Python client.

"Personalization and Adaptive Resonance Theory", Dr. Dobb's Journal, October 2002.
This article discusses the use of the ART1 clustering algorithm for personalization (recommendation).

"Java Mobile Agents and the Aglets SDK", Dr. Dobb's Journal, January 2002.
Demonstrates the construction of simple mobile agents (migratory programs) in Java using IBM's Aglets SDK.

"Embed with the Mailman", Embedded Systems Programming, October 2001.
An SMTP server and client is discussed with source code suitable for use in embedded systems. The SMTP client is discussed in applications of remote statusing (emitting data to a remote client). The SMTP server is discussed from the perspective of command and control (sending emails to the embedded device with commands, and receiving back responses from the onboard SMTP client).

"An Embeddable HTTP Server", Dr. Dobb's Journal, October 2001.
An embeddable HTTP server is presented that is suitable not only for embedded systems, but those without file systems (EEPROM-based). The concept of an application filesystem is presented along with the tools to build and integrate it with the HTTP server.

"Embedded Linux on the PowerPC", Embedded Linux Journal, July 2001.
In this article, I explore the use of Linux (Montavista Linux) on the Embedded Planet RPX-Net PowerPC board.

"CPJazz -- A Software Framework for Vehicle Systems Integration and Wireless Connectivity"
SAE 2000 World Congress. Also appears in the book "Intelligent Vehicle Systems", ISBN 0-7680-0588-4.
Discusses research in connectivity between disparate devices and vehicle buses to disparate wireless assets in a vehicle environment. The flexible CPJazz architecture provides a "software bus" architecture to seamlessly integrate buses and devices for intercommunication.

Presentations

"High Performance Networking", May 2003
I gave this presentation as a guest lecture in Sam Siewert's "Real-Time Embedded Systems" course at Colorado University in the Spring of 2003. In this presentation, I discussed problems and solutions for scaling TCP/IP networking to gigabit networks through a variety of means.

mtj@mtjones.com.

Last Updated December 2011.

	This book continues from the first edition and provides everything that you'll need to develop applications in the GNU/Linux environment. Split into 5 distinct parts, the book covers GNU tools, topics in application development, shells and scripting, debugging and hardening, and introductory optics including the fundamentals of virtualization. New material in this edition includes exploration of advanced APIs, memory debugging, scripting with Python and Ruby, source control, and more. Published April 2008. See more at Course Technology.
	This new AI text, focused on the academic market provides grounding in the theory and practice of the various AI algorithms and methods. The text also follows a "systems" approach, discussing the application of the algorithms in real-world environments. Released Q4 2007. See more at Infinity Science Press.
	This second edition of AI Application Programming builds on the first edition with the addition of five new chapters classifier systems, natural language processing, particle swarm optimization, A-Star pathfinding and reinforcement learning. The text also includes updates for erratum for the first and second printing, additions to the backpropagation learning chapter (for batch-updating) and numerous other additions. Here's a review from the version sold in India through Wiley/Dreamtech. Another review at a Sun blog. More reviews on Amazon, and an interesting blog entry on the book's discussion of evolutionary algorithms and neural networks. The book is also being used as part of a course in game AI at Taiwan's National Chiao Tung University. It's also the text for "Advanced CS Topics: Artificial Intelligence" at Graceland University. The book is also a recommended text at CWU's course in Computational Intelligence.
	Applications are wide and varied in GNU/Linux, and include not only pure applications, but also tools and utilities for the GNU/ Linux environment. GNU/Linux Application Programming takes a holistic approach to teaching developers GNU/Linux programming using APIs, tools, communication, and scripting. Covering a wide range of topics related to GNU/Linux application programming, the book is split into five parts: The GNU/Linux Operating System; GNU Tools; Processes, Communication, and Coordination; Shells and Scripting; and Debugging. Application and tool developers are introduced to the most useful aspects of the GNU/Linux operating system, including tools (compilation, automated build and package creation), standard libraries, communication and synchronization APIs, process and thread models, shell scripting and extension languages, and debugging. After working through the text, programmers will have a solid foundation for developing applications in the GNU/Linux environment. [Why do I use GNU/Linux instead of just Linux?]
	In AI Application Programming, I've demonstrated the applicability of AI algorithms and techniques in a range of different problem areas. My goal with this book was to illustrate that AI can be used as a tool to solve practical problems. Some of the AI algorithms that I discuss in the book include genetic algorithms, neural networks, rules-based systems, clustering algorithms, ant algorithms, fuzzy logic, hidden markov models, simulated annealing and intelligent agents. Sample applications include a personalization engine, a rules-based reasoning system, a character trainer for game AI, a Web-based news agent, a genetic code optimizer, an artificial life simulation, and others. Here's a review from Tom Copeland (who has a rubyforge project to port the C source examples to the Ruby language). Here's another from ADTmag.com, and one more. The version published in India (through Wiley-Dreamtech India Pvt. Ltd.) is reviewed here. More reviews are available through Amazon. This book is currently being used as a text book in Dr. Mario Aguilar's "Applied Artificial Intelligence" at Jacksonville State University, Tom Fernandez' Aritificial Intelligence course at Florida Atlantic University, and Prof Paul McKevitt's Advanced Intelligent Multimedia at the University of Ulster. In 2004, the book was translated into Russian by DMK Press. An erratum is available for this book here.
	The Sockets API is useful not only in traditional high-level language environments (such as C), but also in any worthwhile scripting language. This book details the Sockets API, and discusses its use in scripting languages such as C, Perl, Python, Tcl, Ruby and Java. Sample patterns are provided for each of the languages, such as an HTTP server and an SMTP client. This book is split into three parts. Part one details the Sockets API. Part two discusses the Sockets API in each of the languages explored. Finally, in Part three, software patterns are demonstrated for each of the languages, illustrating the strengths and weaknesses of the APIs for each language. Here's a review from Dan Weeks (also available on Amazon). Here's another one from Don Wilde at daemonnews. An errata item is noted here.
	In this book, I explore a variety of application layer protocols and then illustrate how they can be used in embedded (or non-embedded) systems. For example, how can HTTP be used to control embedded devices, or how can SMTP be used for communication of data or control from remote embedded systems? This book explores these topics, discusses all protocols in detail (with open source implementations), and demonstrates an application in a practical domain. Protocols discussed include HTTP, SMTP (server and client), NNTP, POP3, SLP, SNMP and a embeddable CLI. The book has been translated into Chinese, the link is available here. The book can be purchased in India through Firewall Media here. It's also on Wind River's suggested reading list here. An erratum is available for this book here.