An inside look at PCI/104-Express

2The PC/104 Embedded Consortium recently announced its latest specification, PCI/104-Express. In this expanded view, which amounts to a primer on the specification, one of the creators explains the thinking behind it and illustrates some of the finer details, such as stacking and PCI Express (PCIe) routing.

The PCI/104-Express specification establishes a standard method for using the high-speed PCI Express bus in embedded applications. It was developed by the PC/104 Embedded Consortium and adopted by member vote this March. The PC/104 Embedded Consortium chose PCI Express because of its desktop/laptop PC market adoption, performance, scalability, and growing silicon availability worldwide. It provides a new high-performance physical interface while retaining software compatibility with existing PCI infrastructure.

Incorporating the PCI Express bus within the industry-proven PC/104 architecture provides embedded applications many advantages, including fast data transfer, low cost because of PC/104’s unique self-stacking bus, high reliability because of PC/104’s inherent ruggedness, and long-term sustainability. Figure 1 shows the layout, and Figure 2 shows a board using PCI/104-Express.

21
Figure 1
(Click graphic to zoom)
22
Figure 2
(Click graphic to zoom by 2.2x)

PCI and PCI Express

The PCI bus has been the standard bus in desktop PCs for nearly 20 years. In PC/104-Plus and PCI-104, it is a 32-bit, 33 MHz synchronous parallel bus. The new connectors in desktop PCs are PCI Express.

PCI Express is similar to PCI from the software view but much different in hardware. It is a point-to-point, high-speed, differential serial bus composed of lanes and links. PCI Express uses a packet-based model similar to Ethernet, but Gen 1 PCI Express runs at 2.5 Gbps per lane per direction. Because it is point-to-point, PCI Express can transmit and receive data simultaneously from any or all devices.

Understanding lanes and links: x1, x4, x8, and x16

A PCI Express lane is a transmit differential pair and a receive differential pair connection between two devices. A PCI Express link is one or more lanes and a clock differential pair. Therefore, a x1 (read as "by 1") link has one transmit/receive lane and one clock differential pair. Likewise, a x4 link has four transmit/receive lanes and one clock differential pair. This continues logically for x8 and x16 links, each having 8 or 16 transmit/receive lanes and one clock differential pair. Table 1 shows the bandwidth for various link configurations.

Implementing PCI Express

PCI/104-Express add-in cards are identified by link size. The specification supports four x1 links and one x16 link. Users can place any four x1 cards on the CPU in any order and all of them will work. Each uses its own PCI Express link.

Several choices exist if PCI cards are available to use in the system. One option is to stack CPU, PCI/104-Express, and then PCI cards being careful not to exceed the PCI specification limits. Users also can stack the PCI and PCI Express card on opposite sides of the CPU; for example, they can stack PCI Express cards below the CPU and PCI cards above the CPU. Additionally, if a x1 link card uses a PCI Express switch, which is similar in concept to an Ethernet switch, it can replace the link on the bus and share the bandwidth with another x1 card. This capability means that the number of add-in cards will not be the limiting factor for PCI/104-Express systems.

Taking advantage of the x16 link

The x16 link has options that depend on the CPU and chipset. Users can employ the x16 link as a x16 PCI Express. The most popular use for a x16 link is in graphics cards, but watch for 10 GbE, high-end DSP, frame grabber, and FPGA cards. Xilinx supports a x1 PCI Express core in the Spartan III FPGA and has a x8 hardware PCI Express endpoint in the Virtex-5 FPGA. Altera offers x1, x4, and x8 MegaCores for the Stratix and Arria FPGAs. Several other FPGA vendors and IP vendors offer PCI Express cores, which are probably just the tip of the iceberg.

The PCI/104-Express specification also supports two x4 or x8 links with automatic link shifting on the x16 part of the connector. These add-in cards are identified as x4, x8, or x16 and do not affect the four x1 links. Also, chipsets allow the x16 link to be used for other functions such as Serial Digital Video Output (SDVO), which is automatically selected by the CPU and add-in card. Support for these functions is optional, so consult the CPU manual for more information.

PCI Express switches allow vendors to create bridge cards that provide additional uses for the x16 link. A PCI Express switch on the x16 link can replace all four x1 links, break the x16 into two x4 or two x8, use x16 onboard and replace the x16 link on the bus, and/or create a PCI-104 PCI bus. The switch can do any of these things individually or all of them. Granted, they all share the bandwidth of the x16 link. However, vendors can have four x1s and two x4s and still use a x4 on the board without losing any bandwidth.

Pseudo-multitasking with the x16 link

For an example of the x16 link’s power and flexibility, consider a simple board that uses a three-port, 48-lane switch with an onboard DSP that requires a x16 link. The board replaces the x16 link on the bus.

In a stack-down configuration, the x16 link comes in the top connector and sends all 16 lanes to one of the switch’s ports. The switch’s second port sends 16 lanes to the onboard x16 link DSP. Lastly, the third port sends 16 lanes to the bottom connector replacing the x16 link (see Figure 3).

23
Figure 3
(Click graphic to zoom)

If the system is configured with a CPU and three of the DSP boards depicted in Figure 3 – called DSP 1, DSP 2, and DSP 3 – the boards will be able to perform the following transactions, each assuming nothing else is happening on the x16 link:

  • CPU can talk to DSP 1 at 10,000 MBps
  • CPU can talk to DSP 2 at 10,000 MBps
  • CPU can talk to DSP 3 at 10,000 MBps
  • CPU can talk to any of the DSPs at 5,000 MBps at the same time it talks to any other DSP at 5,000 MBps
  • CPU can talk to all three DSPs at the same time at 2,500 MBps each and still have 2,500 MBps for additional x16 cards
  • CPU can talk to DSP 1 at 10,000 MBps while DSP 2 talks to DSP 3 at 10,000 MBps

This example makes it appear as though the CPU is performing more than one transaction at a time when in reality it is not. Express switches operate like Ethernet switches. A host sends packets of data to the switch at the fastest rate possible. The switch buffers the packets and sends them to a device at the fastest rate the device can handle. These packets can vary in size but are typically no more than 512 bytes.

So, how does a PCI Express switch let a CPU talk to five devices seemingly at the same time? Imagine a host that uses a x16 link. Devices #1 and #2 are x1 link devices, devices #3 and #4 are x4 link devices, and device #5 is a x8 link device. The drawing in Figure 4 shows the timing required for this "simultaneous" operation.

24
Figure 4
(Click graphic to zoom by 2.2x)

Using PCI/104-Express

Mechanics of stacking

PCI/104-Express is designed to simplify system configuration. For example, if a designer uses a typical 3.6" x 3.8" (92 mm x 96 mm) 104 form factor CPU, the CPU is placed on top of the stack, as shown in Figure 5. PCI/104-Express add-in boards are then stacked below the CPU. Because all add-in boards are universal cards and employ automatic lane shifting, the designer never has to set switches or jumpers.

25
Figure 5
(Click graphic to zoom by 2.0x)

If a designer chooses one of the PC/104 Embedded Consortium’s larger form factors such as EPIC or EBX or uses a PCI/104-Express module as a macro-component on a baseboard, the CPU is typically placed on the bottom of the stack, and add-in cards are stacked on top, as shown in Figure 6. The exact same cards used in the stack-down configuration can be used without any changes in this configuration as well because PCI/104-Express features a universal add-in card design that automatically detects if it is installed above or below the CPU and selects the correct PCI Express link. Because of the frequencies involved with PCI Express, it is not recommended to stack modules both above and below the CPU at the same time.

26
Figure 6
(Click graphic to zoom by 2.0x)
Form factors

The PC/104 Embedded Consortium maintains specifications for three form factors. The original 104 form factor is the readily recognized 3.6" x 3.8" (92 mm x 96 mm) board. EPIC provides a midsized board at 4.5" x 6.5" (115 mm x 165 mm). EBX is the largest at 5.8" x 8" (146 mm x 203 mm). PCI/104-Express is defined on all three form factors, as shown in Figure 7.

27
Figure 7
(Click graphic to zoom by 2.0x)

Jim Blazer is the vice chairman and Chief Technical Officer of RTD Embedded Technologies, Inc., in State College, Pennsylvania, where he is responsible for managing intelligent data acquisition system and embedded PC designs. He currently serves as chairman of the PC/104 Embedded Consortium’s Technical Committee. Jim has a BSEE from Penn State University.

RTD Embedded Technologies
814-234-8087
jblazer@rtd.com
www.rtd.com

For more information on the PCI/104-Express specification, check out the PC/104 Embedded Consortium website at: www.pc104.org/pci104_Express_specs.php