Engineering Sun

Subscribe to Engineering Sun: eMailAlertsEmail Alerts
Get Engineering Sun: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Java Developer Magazine

Java Developer : Article

Optimizing Java Performance in Heritage Designs

Optimizing Java Performance in Heritage Designs

Java, in its J2ME guise, has all the attributes of a first-rate platform for embedded system design. More specifically, its platform independence, code portability, and robust operation render it particularly suited to such applications. The extensive use of embedded Java-based devices in the future is secure due to the proliferation of standards based on it, and, moreover, the endorsement of major OEMs committed to its use in their designs.

It's become clear that the potential marketplace for embedded Java devices is vast, but that some of these markets are not yet mature. Successful manufacturers in the immediate market for embedded devices, such as wireless handsets and set-top boxes, possess a huge investment in legacy code that they, not unreasonably, wish to retain. Along with the problem of generating acceptable performance in resource-constrained environments, the migration to Java-enabled devices in markets that are already established and based on other technologies is the most significant barrier to the widespread adoption of Java as the de facto standard in the embedded space.

Of the emerging solutions, both hardware- and software-based, none can claim to be a panacea. This article discusses the introduction of Java into multilanguage heritage designs, focusing on the advantages and disadvantages of deploying each solution.

Java Bytecode Execution in an Embedded Environment
Obviously, some platforms will be more proficient than others at executing Java code. The issue is clouded by hype, but, fundamentally, Java bytecode can be executed in one of three ways: software translation, hardware translation, or direct execution.

Translation in Software: The Java Virtual Machine
Bytecode can be executed using a software Java Virtual Machine (JVM) or, more specifically, a KVM designed particularly for embedded devices. Java code can be executed on any such virtual machine. A JVM takes the precompiled Java source code (bytecode) and translates it into the native machine code of the processing platform in question preceding its execution. Indeed, this process of interpretation is central to the Java concept of platform independence.

Translation in Hardware: Bytecode Accelerators
A bytecode accelerator is a hardware solution that uses the resources of an existing host processor. Accelerator solutions don't execute Java bytecode directly; instead, they convert the bytecode (in hardware) into the native instructions of the host processor prior to execution. Invariably, such solutions also utilize a software-based JVM, modified by the replacement of the main interpreter loop and execution unit with the bytecode accelerator.

Native Java Processors
Native Java processors are microprocessors designed to execute bytecode directly as their native instruction set. They can be deployed as a coprocessor to a host processor in a multilingual, multiprocessor system, or as a standalone solution in a dedicated embedded Java design.

Embedded Multiprocessor Java Solutions
While there's a clear desire for Java capabilities to be introduced into many embedded applications, it's a prerequisite that Java bytecode is executable in parallel with existing heritage code, rather than in place of it. Primary examples of such applications would be a mobile phone running a C-coded communications stack, or a set-top box currently evolving to support interactive or Internet-based content. Understanding the fundamental design issues is vital when designing high-quality embedded devices for Java-based applications. Characteristics that influence the selection of components for any embedded system include:

  • Resources
  • Performance
  • Ease of integration
  • Cost

First, JVMs are inherently resource hungry. This is a corollary of the software interpretation layer, which abstracts the code, and the processor upon which that code is executed. A JVM will typically map a single Java bytecode into several native processor instructions prior to execution; therefore, to sustain acceptable Java performance, a very fast processor is required. Relatively speaking, the rise in silicon cost and power consumption intrinsic in the use of such powerful processors is huge. Additional memory resources, occupied by the JVM itself, present a further burden for embedded applications.

Bytecode accelerators also use a JVM and so require the same additional memory resources as software-only JVM solutions. Typical bytecode accelerators are efficient in terms of silicon cost when added to an existing host processor; however, if a second dedicated processor is used, the gate count of this additional processor must also be taken into account. Native Java processors vary drastically in size. Those that are stack-based, and thus accurately match the Java execution model, have a very low silicon cost, whereas those based on a standard RISC processor are less than optimal.

Since there are still no dependable metrics available to evaluate the performance of embedded Java solutions, code execution speed remains an emotive issue. When applied prudently, benchmarks are an invaluable asset. However, they're not the sole criteria for evaluation and must be regarded with caution since ultimately the crucial point is how fast the platform can execute the end application code. CaffeineMark figures are widely quoted but are not representative of real applications. It's hoped that the imminent arrival of EEMBC industry-standard benchmarks will clarify the issue as discussed in my previous article "J2ME Benchmarking: A Review" (JDJ, Vol. 7, issue 1).

Generally speaking, solutions that rely on the translation of the Java bytecode into one or more native instructions, by either a hardware or software interpretation process, will execute code much more slowly than solutions that are able to execute the bytecode directly. Native Java processors can execute bytecode directly for the vast majority of bytecode. More complex instruction types can be microcoded (i.e., they follow a number of internally coded steps), or else, when this is not practical, a jump to a predefined software routine is invoked (see Figure 1).

Register-rich hardware solutions (e.g., bytecode accelerators or, similarly, those native processors based on RISC cores) will suffer a further performance impact resulting from the need to preserve the state of the registers during the frequent context switches that are a feature of a threaded language like Java.

JVMs are available for most processors and are the most expedient way to enable Java capability on an existing platform. However, this approach is wholly inefficient and not in any way aligned with the J2ME paradigm, as the performance versus resources trade-off in this case is difficult to justify for embedded devices. Bytecode accelerators are specifically designed to operate juxtaposed with a host processor and are relatively easy to integrate. Furthermore, they're able to execute Java bytecode more rapidly than the pure software JVM solutions they replace. However, this is still at the expense of a reduction in the available bandwidth of the host processor for other functions (e.g., communications for an interactive application) as a result of the extra processing burden placed on it.

Native Java processors can execute Java bytecode at optimal speeds and do not place any extra burden on the host processor if deployed as a coprocessor, since they can operate concurrently. Taking everything into consideration, there's a clear migration path (probably time-line dependent) from "easy-to-integrate" JVM solutions through bytecode accelerators to the ultimate performance offered by native Java processors.

How simple is it to integrate a native Java processor with an existing host core? The answer, of course, depends on the design of the processor. The final part of this article explains such a design in more detail.

Finally, though licensing costs are somewhat tangential to this discussion, they're worth a mention since it's an important concern for devices that are produced in high volume. While cost is very much a vendor-specific issue, it's worth pointing out that solutions that utilize both a JVM and hardware intellectual property will incur license fees for both resources.

Integrating a Native Java Processor into a Multiprocessor System
The integration of a Java processor as a loosely coupled coprocessor can be simplified by the addition of a few extra features, including:

  • An industry-standard bus interface
  • Relocation support for the core memory map
  • Host processor communication support

Externally, the Java processor must present an industry-standard bus interface (e.g., AMBA, AHB, MLB) to simplify integration of the processor with the host CPU (see Figure 2). In a coprocessor scenario, both processors are declared bus masters. Since they're able to process data concurrently and are completely independent of each other, conflicts may occur when both processors request bus access simultaneously. Ultimately, in such circumstances, the decision of which processor takes priority lies with the bus arbiter and is defined by the systems integrator at design-time. Code caches are an important feature of any coprocessor implementation. Their importance lies in the fact that not only do they reduce code access times, but they also limit system bus access and so reduce bus contention.

By default, and upon reset, a standalone processor would sensibly execute code from the first location in memory. However, in a multiprocessor system, it must be possible to relocate the program counter to allow the host to redirect the vectors for external instructions to an appropriate location in the physical address map. This could be achieved, for example, by reconfiguration of an index register.

Low-level support must also be provided for interprocessor communications. In the example described here, this is achieved using two mailbox registers: one for communication from the Java processor to the host, the other for communication in the reverse direction. A command packet passed from the sending processor to its mailbox then generates an interrupt to inform the recipient processor that a new value has been written. Subsequently, a further interrupt would be generated to inform the sending processor that the recipient has read the value. It follows that the recipient processor is then able to extract the format of the request by inspecting the mailbox, which could be a method call, data transfer, or reference to a multimedia object. Java coprocessor solutions that are currently market-ready use one of two approaches to implement data transfer. This depends on whether the processor requires dedicated memory resources or is able to support shared access to system memory. Ideally, system memory can double up as a communications area using an independent memory location that's accessible to both processors to transfer data. Otherwise, where the processor does not support shared memory, a FIFO buffer can be used to provide a data transfer path, though this increases the complexity of the design.

Ultimately, a J2ME application programmer shouldn't need to care about the hardware resources and, indeed, from an abstract point of view, there will be little or no difference between Java code developed for single or multiprocessor solutions. As an example, let's assume that the Java code wishes to make use of a set-top box resource supported by the host processor, such as the tuner. This resource would be accessible only via a Tuning API, such as the one specified in the DVB Multimedia Home Platform standards. In this scenario, a standard Java method could trigger a request (passed via mailbox registers) to the host, passing arguments to indicate which channel is required. Once the operation had been carried out, the host would signal to the Java processor, again via a mailbox register, that the request had been successfully completed (or otherwise), and that the selected channel was available.

Similarly, the process of debugging Java application code is as simple on a multiprocessor platform as it is on a single processor. This can be accomplished using standard protocols, such as the KVM Debug Wire Protocol (KDWP) to interface the Java processor directly to a development and debug environment such as Forte. In this instance, a JTAG port would be used to enable arbitrary locations in memory to be written to (i.e., to send command packets) or read from (i.e., to receive reply packets). Alternatively, debug can be accomplished via the host processor, using mailbox registers to enable communication between the two processors, as described earlier.

This article discussed issues that pertain to the selection of a Java solution for devices with a significant investment in heritage code. Moreover, following a clear migration path from virtual machines to embedded hardware solutions, the article also discussed the practical implementation of a dedicated hardware coprocessor solution. It's probable that all the solutions described, from the easiest to integrate to those offering the ultimate performance, will be deployed in multilingual, multiprocessor systems long before single-language devices are upon us.

More Stories By Carl Barratt

Carl Barratt works in applications support for Vulcan Machines. He has over seven years of experience in various hardware and software development roles. Carl holds a BEng (Hons) degree in electronic engineering and has undertaken PhD research with the University of Nottingham.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.