FPGAs: Tough to program, but key for embedded computing

3Embedded computing designers continue to look at ways to cost-effectively integrate FPGAs into their PC/104 and other small form factor designs while battling the time-consuming and difficult task of programming the devices.

Most people in the PC/104 and small form factor world view an FPGA as an expensive option. FPGA vendors have touted their wares as ideal replacements for DSPs, CPUs, and GPUs – even for all of them in a single device – but they are notoriously difficult for software engineers to program as they are not anything like a conventional processor. Much of this expense comes from the efforts to program the FPGAs, which require expertise in writing VHDL, an expertise that is getting harder and harder to come by as there are fewer and fewer VHDL programmers.

FPGAs are hardware-centric and were originally programmed used schematics layout tools, very similar to PCB designs via software today. They are effectively just a bunch of empty transistors, arranged as gates that can be programmed as zero or one – similar to what was possible with a 7400 TTL device in 1966! FPGAs started out as simple logic, then they became many-logic-cells, for pre-processing of data going from a real-world signal to a CPU (see Figure 1).

Figure 1: Above is the first Sundance FPGA processor module with video I/O – and pre-processor for a TI DSP module, sitting on a PC Bus – dating back to 1999. The other is Sundance’s current product with a Zynq FPGA that is a quarter of the size of the 1999 device.

The devices then evolved to be communication processors, moving data fast on parallel and now serial interfaces, resulting into the all-of-the-above devices you see today with an integrated CPU. However, they still have thermal challenges and price issues – as they do not follow the typical downwards curve of silicon, since none of the FPGA vendors have a fab.

The arrival of Verilog and VHDL made it possible for hardware engineers to code in some form of readable format the gates, rather than draw them. The concept was – and still is – to create some kind of portability between various generations of FPGAs and even different vendors, similar to typical software languages – and being able to port to custom ICs, like ASICs. This concept is being undermined with the vendors’ special features that only work on their variation of FPGA, so it is not possible to port to another vendor. With each new family of FPGA the manufacturer requires entire new sets of tools. For example our Xilinx Virtex 5 solution will not work with the Xilinx Virtex 7 and we have to do everything all over again – even the tools are different. I don’t believe they are doing it to be evil. They do it to improve the product, and the improvements often soften the incompatibility headache as new FPGA families are often four times faster than their predecessors, while processors only make modest performance enhancements.

VHDL programming challenges

VHDL expertise is hard to come by in part because, unfortunately, VHDL programming is not a popular topic in university curriculums. VHDL programmers are not graduating universities in busloads. Kids studying programming typically learn to write in C and of course OpenCL because it’s cool and great with graphics and image processing. When you leave university with a degree with graphics you can show an employer something.

The popularity of C is behind the push by industry to develop good tools for writing code in C for translation into VHDL. The process does work as there is a whole tool chain for doing it, but it is very time consuming as you have to simulate in C to get to VHDL, then simulate the VHDL and make sure that still works, as the bit stream as it can be considerably longer. Thus, the development time is very long. Ask any Software Team how they solve a small bug with a quick fix and they will say: “Try; if that doesn’t work, try something else” – and the whole experiments might take a few minutes with a CPU-based system, but can take hours with a FPGA solutions.

The other – and in my view, totally failed effort – is to try to program FPGAs via OpenCL. Being a flavor of C, it does sound logical to try to use the same tools that enable C-to-VHDL and that is the failure. The code goes from OpenCL to C to VHDL – and then it has to be converted into the native bit-stream of an FPGA. Not only is this not efficient in terms of generated code, but it is also very time-consuming. OpenCL is targeted toward graphics processors and people thought that maybe they could use that for FPGA programming. However, while OpenCL is perfect for GPUs as they are constantly improving the tools, lastest being 2.x, FPGAs are only just about ready with basic features that is supported by OpenCL in 1.x.

The ongoing efforts by vendors to create development tools by either creating model-based programming – such as MathWorks, NI, etc. – and the ongoing attempt to develop C-to-VHDL tools has not really made any inroads in the embedded solutions market. They are not very efficient, and hence are quite costly, as larger FPGAs are required. These larger devices are very useful for scientific applications, test systems, research ideas, and an ideal way to teach students about embedded systems, as they are very graphical, but not for basic embedded solutions.

So, if all of this is true, that FPGAs are impossible to program, then FPGAs would surely not have any use in embedded systems. Right? Wrong! FPGAs do have value in embedded systems!

The typical embedded solution has a limited power budget and has to be small and fairly rugged. Fans are banned to lower the mean time between failure (MTBF) and reduce maintenance cost, so with about 25 W for the combined solution you can have a winner. That also happens to be the upper limit of the Power-over-Ethernet standard, so a single external cable with power and data is possible.

It should be noted that FPGA designers have also improved the power management characteristics of FPGAs in recent years, especially with the integration of low-power ARM cores. They have power management features where you are only running the IP cores necessary to a particular function in the FPGA, leaving the others dormant until they are needed.

The typical embedded system-on-chip (SoC), being it x86 CPU, ARM CPU, or MIPS CPU has every interface known integrated as default, can run a version of Linux or similar, and is cost-effective. One such SoC is Texas Instruments’ DaVinci DM81xx family, which has a 1.5 GHz ARM-8 CPU, a 1.5 GHz Floating-Point DSP and as many as three integrated H.264 compression engines. They offer 1 Gb Ethernet (GbE), USB, SATA, and even a 1080p 3D Graphics Display Engine and take a maximum 10 W while starting at $40.00. Visit www.ti.com for more information.

However, the video ports for these devices are hopeless. They are designed to work with TV standard video encoders, so they do not support any custom cameras that use either propriety video formats and only support specific frame-rates and commodity resolutions, such as 1920 x 1200 and a maximum speed of 165 MHz of data. One solution is adding a small FPGA in front of the video ports that can connect to any video source through the flexible I/O of an FPGA and using the FPGA’s logic to translate/convert the incoming data into a format that will allow the SoC to think it is getting standard video format.

The DM8168 is such a development platform from Sundance and in this case the FPGA is small and $15.00. This design took a few man-months to implement in pure VHDL and provide a SuperHD (effectively, as high as 8000 x 8000, as the FPGA also has video frame buffers/storage) and will convert into the video port interface format that DaVinci understands and can do a H.264 real-time compression and transmit via the 1 GbE links. If the design is integrated into the latest PC/104-based blade solution that offers Power-over-Ethernet, passive cooling and is ruggedized, then it is a truly embedded solution (see Figure 2). Visit www.sundance.com for more information.

Figure 2: The DM8168 is such a development platform from Sundance that converts into the video port interface format that DaVinci understands and can do a H.264 real-time compression and transmit via the 1 GbE links.

Going forward – and how long before my dream happens?

What I hope to see is that FPGAs become dynamical re-configurable heterogeneous SoCs, with a number of CPUs present where maybe some are 32-bit, some 64-bit, and some even 8-bit! Maybe they are soft cores with maybe few dedicated low-power 32-bit DSPs and then an area of the die that is free for users to use as accelerators, using IP cores that can be called upon, if required, by the CPU – or otherwise left idle and in a sleep mode.

The IP cores would be developed by VHDL/Verilog experts and be optimized, from a graphical interface, to an interpreter to C, then to VHDL – and then a bit-stream. All the I/O should be IP-Blocks as well, so if you want 100x GPIO, then you will have them. If you want a 40x USB-2 interface, then you will have it. If you want graphics output, then simply add a HDMI or similar. If you want a frame grabber, then add that.

Considerable amount of effort is diverted into heterogeneous SoC development for embedded solutions by both the academic world and the incompetent semiconductor companies – from small fabless ones to the biggest fish in the pond. It’s critical, as Moore’s Law of scaleing has ended, but we still need more performance and multiprocessing this way forward.

Sundance is part of such efforts in a Pan-European Consortium called FlexTiles (www.flextiles.eu). The fully working emulator of this concept will be presented in April at the Applied Reconfigurable Computing Symposium (ARC’15 – http://arc2015.esit.rub.de/). Give it another three to four years and a FlexTiles look-alike chip could be available in your favourite shopping mall. I shall be in the line-up.

Flemming Christensen is president of Sundance Multiprocessor Technology, which has manufactured SFF modules such as PC/104 or TIM-40 since the late 1980s. Sundance continues to develop generic modules that comply with non-proprietary, open industry standards. They cover typical high-speed on module communication systems to connect analog digital signals from digital signals typically using LVDS signaling. Christensen is also a member of the PC/104 Consortium.

Sundance Multiprocessor Technology www.sundance.com