Linux for Data Acquisition

Linux is fast becoming the operating system of choice for test and measurement as well as process-control applications based on traditional desktop hardware. The primary reasons for this are robustness of the operating system and the availability of source code.

Over the past two years, Linux has been making inroads against Microsoft Windows as a platform for desktop data acquisition and control applications. This is particularly apparent with applications that require 24/7 operation.

Although some may argue that Microsoft has made strides, Windows has had a poor reputation for robustness in the desktop environment. (WindowsCE and Embedded WindowsNT are out of the scope of this article.) This, even more so than the capability to modify the Linux kernel, has made many end users switch to Linux for such applications.

Another trend over the past two years is the movement from ISA-based data acquisition devices to PCI as well as CompactPCI data acquisition devices. This trend not only is likely to continue but also probably will accelerate due to the dwindling number of desktops that support the ISA bus.

The upside to this conversion from the ISA bus to the PCI bus is that PCI permits us do things that ISA-based hardware was not capable of doing. These new capabilities allow data acquisition and control manufacturers to build devices that not only have more capability but also are less expensive to produce than their ISA-based counterparts.

The development of PCI-based data acquisition systems under the Linux operating system comes down to designing device hardware and driver software that can take advantage of PCI-based features. The most important of these is the capability to perform bus-mastered direct memory access (DMA) operations. While asynchronous I/O certainly is important in the implementation of data acquisition and control drivers, we are going to concentrate on the more challenging task of moving large amounts of data very quickly via DMA.

While not delving into the details of Linux driver architecture, a few things must be addressed for the purposes of clarity. First, when a driver is loaded under Linux, it is loaded along with the rest of the Linux kernel and actually is part of the kernel itself. Second, you interface to the driver through the IOCTL entry point much in the same way as you would interface to a standard Windows driver. Though the mechanics of opening and closing the driver are different, many parallels exist between a Linux driver and a Windows-based driver.

In an ISA system, you would set up a DMA operation to a single block of memory (Figure 1). When the block was filled, you would get a DMA complete interrupt and would then have to have the DMA complete interrupt handling code that would shut down the interrupt, update pointers/status variables, prepare the next DMA block, and then re-enable the DMA.

While your driver code is performing this activity, the device is buffering data internally until the driver has re-enabled the DMA process and data can again flow freely from the device to PC memory. The ramifications of this are twofold:

The device must allow enough buffering so no data is lost while the driver is cleaning up the last DMA complete interrupt and preparing the next DMA block. This means added expenses for buffer memory and buffer management logic.
Valuable CPU time is being burned processing and managing the DMA process.

With the advent of PCI, we gain the advantage of bus-mastered DMA. Bus-mastered DMA allows the device itself to control the DMA operation with little or no intervention from the host CPU (Figure 2). Many PCI controller-chip manufacturers support DMA block chaining or scatter-gather. DMA block chaining under the PCI bus allows for a seamless DMA across a number of driver-predefined memory blocks without intervention from the driver or the CPU.

So what does DMA block chaining do for us? Well, it allows us to define a series of memory blocks into which the DMA is to occur without getting the driver involved in managing the DMA process.

Using chained-block DMA, we can add an entry after the last physical block to point to the first (Figure 3). By doing so, we, in effect, have created a circular buffer in the PC memory which, when configured in that manner, allows us to continuously DMA into memory without any intervention from the CPU. This means our relatively expensive hardware-based FIFO can be replaced with an inexpensive and nearly infinitely large PC RAM-based FIFO.

A Windows-based system will require that the DMA blocks be broken into 4k elements due to page-size limitations, necessitating a number of blocks to comprise the RAM-based FIFO. So in a Windows-based driver, the RAM FIFO needs to be broken into 4-kB blocks, then chained together.

Here is how the code may look to configure a RAM-based FIFO using multiple DMA blocks.

//——————————————————————————————————
//
// Routine Name: adcSetupChainDescriptors
//
// Description: Sets up the chain descriptor blocks for the PC based
// FIFO. This routine allows for chaining of multiple
// memory blocks within the FIFO. Each block address is
// programmed into the PLX 9080 DMA PCI bus master
// controller. This FIFO is via a circular chain of DMA block
// addresses where the last block points back to the first block
//
// Enter With: device extension address
//
// function return value = physical addr. of the
// first valid desc. block
//
//—————————————————————————————————–
DWORD
adcSetupChainDescriptors(PDEVICE_EXTENSION pde)
{
PHYSICAL_ADDRESS physicalAddress;
PWORD bufPhyAddr;
PWORD bufSysAddr = (PWORD)pde->dmaInpBuf;
DWORD blockSize, firstBlockSize, lastBlockSize;
DWORD bufSize = DMAINPBUFSIZE * 2L;
DWORD bufCount = 0;
WORD descIndex = 0;
DWORD firstBlockAddr;

physicalAddress = 0;
bufPhyAddr = (PWORD)virt_to_phys((PVOID)bufSysAddr);

// Compute the number of bytes actually used in the first and last pages
firstBlockSize = blockSize = 0x00001000L – ((DWORD)bufPhyAddr & 0x00000fff);
lastBlockSize = 0x00001000L – firstBlockSize;

// Find the first valid descriptor block
while(pde->dmaInpChain[descIndex].descriptorPointer == 0xffffffffL) descIndex++;

while(TRUE)
{

pde->dmaInpChain[descIndex].pciAddr = virt_to_phys(bufSysAddr);

// Save the source address in the descriptor block
pde->dmaInpChain[descIndex].localAddr = (ULONG)(&nulldaqMemoryMap- >acqResultsFIFO);

// Save the count in the descriptor block
pde->dmaInpChain[descIndex].transferByteCount = blockSize;

// Set the next descriptor address
// Control: bit 0 = 1 –> Descriptor address is in pci address space
// bit 1 = 0 –> Not end of chain
// bit 2 = 0 –> No interrupt on terminal count
// bit 3 = 1 –> Direction is local bus to pci bus
pde->dmaInpChain[descIndex].descriptorPointer =
(pde->dmaInpChain[descIndex].descriptorPointer & 0xfffffff0L) | 0x00000009L;

// Update the buffer count
bufCount += blockSize;

// Assumes 2 byte sample size
bufSysAddr += (blockSize / 2);

// Check if we are done
if(bufCount >= bufSize)
{
// Done, we have reached the last block so the desc. pointer of this block must point back to the
// block to complete the circular
pde->dmaInpChain[descIndex].descriptorPointer = (firstBlockAddr & 0xfffffff0L) | 0x00000009L;

// Done, check if the last block is a partial block
if(bufCount > bufSize) pde->dmaInpChain[descIndex].transferByteCount = lastBlockSize;

// And exit out of the loop
break;
}

// Not done, remaining blocks need to be added to the chain
blockSize = 0x1000;

// Find the next valid descriptor block
descIndex++;

while(pde->dmaInpChain[descIndex].descriptorPointer == 0xffffffffL) descIndex++;
}

}

//——————————————————————————————————-
//
// Routine Name: adcSetupDmaTransfer
//
// Description: Sets up pci dma channel 1 on the PLX 9080 to
// transfer acquired data from the acquisition board
// to pc memory. Uses adcSetupChainDescriptors to setup
// the circular chained FIFO for multiple memory blocks
//
// Enter With: device extension address
//
// Returns: dma channel 1 setup
//
//——————————————————————————————————
VOID
adcSetupDmaTransfer(PDEVICE_EXTENSION pde)
{
BYTE * base9080;
DWORD firstBlockAddr;

firstBlockAddr = adcSetupChainDescriptors(pde);

// Setup 9080 DMA Channel 1 for Chaining:
// Get the base address of the 9080 configuration registers
base9080 = (BYTE *)(pde->plxVirtualAddress);
*(DWORD *)(base9080 + PCI9080_DMA1_MODE) = 0x00021ac1L;

// Set PCI transfer address, (destination address)
*(DWORD *)(base9080 + PCI9080_DMA1_PCI_ADDR) = 0x00000000L;

// Set Local bus transfer address (source address)
*(DWORD *)(base9080 + PCI9080_DMA1_LOCAL_ADDR) = 0x00000000L;

// Set Number of bytes to transfer
*(DWORD *)(base9080 + PCI9080_DMA1_COUNT) = 0x00000000L;

// Set direction local bus to pci bus, next descriptor address
*(DWORD *)(base9080 + PCI9080_DMA1_DESC_PTR) = (firstBlockAddr & 0xfffffff0L) | 0x00000009L;

// Set dma fifo thresholds
// dma fifo size = 16
// PCI to local almost full = 1
// PCI to local almost empty = 1
// local to PCI almost full = 1
// local to PCI almost empty = 1
*(WORD *)(base9080 + PCI9080_DMA1_THRESHOLD) = 0x0000;

}
Here, the first routine adcSetupChainDescriptors is used to set up a chain of multiple descriptors that describe each block in the chain. The second routine, adcSetupDmaTransfer, performs the actual setup of the PCI controller chip (in this case a PLX 9080).

Linux, however, allows the contiguous allocation of data blocks (Figure 4). While this will not improve the efficiency of the transfer, it certainly simplifies the design of the RAM-based FIFO.
The adcSetupChainDescriptor routine can be simplified as follows:

DWORD
adcSetupChainDescriptors(PDEVICE_EXTENSION pde)
{
PHYSICAL_ADDRESS physicalAddress;
PWORD bufPhyAddr;
PWORD bufSysAddr = (PWORD)pde->dmaInpBuf;
DWORD blockSize,
DWORD bufSize = DMAINPBUFSIZE * 2L;
DWORD bufCount = 0;
WORD descIndex = 0;
DWORD firstBlockAddr;

physicalAddress = 0;
bufPhyAddr = (PWORD)virt_to_phys((PVOID)bufSysAddr);

// Compute the number of bytes actually used in the first and last pages
blockSize = 0x00001000L – ((DWORD)bufPhyAddr & 0x00000fff);

pde->dmaInpChain[0].pciAddr = virt_to_phys(bufSysAddr);

// Save the source address in the descriptor block
pde->dmaInpChain[0].localAddr = (ULONG)(&nulldaqMemoryMap->acqResultsFIFO);

// Save the count in the descriptor block
pde->dmaInpChain[0].transferByteCount = blockSize;

// Set the next descriptor address
// Control: bit 0 = 1 –> Descriptor address is in pci address space
// bit 1 = 0 –> Not end of chain
// bit 2 = 0 –> No interrupt on terminal count
// bit 3 = 1 –> Direction is local bus to pci bus
pde->dmaInpChain[0].descriptorPointer = (pde->dmaInpChain[0].descriptorPointer & 0xfffffff0L) | 0x00000009L;

// Update the buffer count
bufCount += blockSize;

// Assumes 2 byte sample size
bufSysAddr += (blockSize / 2);

// Done, the desc. pointer of this block must point back to the block to complete the circular
pde->dmaInpChain[0].descriptorPointer = (firstBlockAddr & 0xfffffff0L) | 0x00000009L;

}

The question now becomes: Do we need any data buffering on our device at all? The answer is yes. Buffering on the device cannot be completely eliminated since we need to withstand latencies on the PCI bus interface itself. This means that we must be able to buffer small amounts of data until the PCI bus can get around to servicing the device’s DMA requests. However, most PCI controller-chip manufacturers have built-in buffers for this purpose, so the added design cost is negligible. You may find hardware-based FIFO on some manufacturers’ PCI data acquisition devices. This normally is due to expediency rather than design. Some have retained the FIFOs since many PCI devices have been ported from ISA-based equivalent devices.

But what about our interrupt? Since we no longer need to use it to manage the transfer of data, do we still need it? The answer is maybe. We still need to monitor the progress of our data transfers so we can use the DMA block complete interrupt to manage state variables and update pointers.

Under most PCI controller implementations, you still can receive your DMA block complete. You simply no longer have to clean up the old DMA and set up the new one.

However, we can take this a step further. With the appropriate hardware and PCI controller implementations, it is possible to eliminate the need for the interrupt entirely by using a thread or timer that can wake up periodically and poll the progress of the DMA process. Using the thread or timer wake-up and poll method means that the hardware need not implement the interrupt at all, freeing up a valuable resource within the PC.

If you are worried about the latency of a thread vs. the interrupt, then you may want to use the interrupt scheme anyway. This, however, only is necessary when there is a need to see the input data as quickly as possible.

Also remember that you should never lose data because you can make your RAM-based FIFO as large as necessary to take care of any latency in the thread-based architecture. In the end, the interruptless system is much more efficient and, in most cases, performs very well.

About the Author

Scott Ludwick, the director of software engineering at IOtech, has 12 years experience in data acquisition and control. In 1988, he earned a B.S. in computer engineering from the University of Akron. IOtech, 25971 Cannon Rd., Cleveland, OH 44146, 440-439-4091, e-mail: [email protected].

Embedded Linux

The interest in Linux as an embedded OS has exploded in recent years, and the reasoning behind that interest goes much deeper than it’s the FREE OS. Let us briefly explore some of the more significant reasons Linux is being implemented in, or investigated for use in, a broad spectrum of products including data acquisition and instrumentation equipment.

The primary advantages of embedding Linux are the availability of Linux source code and the freedom for developers to modify the OS as needed. This independence from relying on third-party software providers for changes to the OS allows developers to chart their own course in implementation, whether they wish to add or remove functions or optimize for size or speed as their application demands. This can be quite important for small companies that don’t have the volume to induce third-party providers for custom changes.

Stability is an attractive feature for embedded applications that may be required to run for weeks, months, or even years at a time without glitching or hanging. Because Linux is a Unix derivative and because of the large developer community that is embracing and extending Linux, it has grown to be a very stable platform. This is evidenced by its use in Internet servers where 24/7 operation is expected.

The integrated network capability (TCP/IP) of the Linux OS also appeals to developers who are being asked more and more to provide networked features for their products. Many third-party OS suppliers charge extra for TCP/IP stack implementations.

Real-time extensions are available for engineers who have hard real-time requirements for their applications. These for-fee extensions are not part of the Linux kernel but rather run Linux as a task in a real-time environment.

The popularity of Linux/Unix in academia and with the latest generation of engineering students provides graduating engineers with a strong Linux knowledge base. Embedded Linux reduces product cost or increases margins because it’s royalty free. As the Linux developer community continues to develop the kernel and expand the number of microprocessors supporting Linux, it will become increasingly viable as the OS for a broad spectrum of embedded applications.

There are positives and negatives to most things in life, and Linux is no exception. There are concerns that should be raised when studying the question of using Linux or other available embedded OS choices. Let’s quickly address some of the caveats of embedded Linux.

One concern that looms large in the minds of many embedded Linux developers is the GNU General Public License (GPL). In very general terms, the GPL states that if you modify software that is under the GPL and then distribute that software, you must make the source code available to anyone who wants it. (See www.gnu.org for the license in its entirety.) While I wholeheartily support the GPL as it relates to general computing devices such as PCs, SBCs, or other standardized computing platforms and general kernel improvements and bug fixes, I (along with others in the industry) believe that some clarification of terms would be appropriate to allow it to fully encompass the embedded space.

Unfortunately, treatment of the GPL as it relates to the embedded Linux is beyond the scope of this article. Suffice it to say you will need to draw your own conclusions about the impact of the GPL on your ability to produce product and protect your intellectual property.

Another point to consider: What is the real cost of developing an embedded Linux product? While it’s possible to download most of the tools you’ll need to develop your application for free, getting those tools up and running can be a significant task. Finding support when you run into problems must be considered.

Several commercial packages for embedded Linux provide development environments and varying levels of hand-holding. These development seats can cost as much as packages from commercial OS providers. You also must consider how the Linux development environment fits into your established code control and archive utilities.

Derrik Weeks
e-mail: [email protected]

Return to EE Home Page

Published by EE-Evaluation Engineering
All contents © 2002 Nelson Publishing Inc.
No reprint, distribution, or reuse in any medium is permitted
without the express written consent of the publisher.

February 2002