Review of CHES 2000, August 17-18, 2000, by Joe Marconis

A Summary of CHES 2000
Workshop on Cryptographic Hardware and Embedded Systems
August 17-18, 2000

Joe Marconis
InterTrust Technologies Corp.
Jmarconis@intertrust.com

CHES 2000 was held at Worcester Polytechnic Institute (WPI) in Worcester, Massachusetts. It was the second year for this event which provides a forum for combining theory and practice for integrating strong data security into modern communications and e-commerce applications. As the name of the workshop implies, the focus is on cryptographic hardware and embedded systems design. There were almost 200 attendees for the event, more that half coming from outside the USA. The workshop Chairs, Çetin Kaya Koç and Christof Paar, did an excellent job of putting together an interesting and successful workshop. CHES 2001 will be held in Paris, France May13-16, 2001. For more information go to www.cheswokshop.org

During his opening remarks Christof Paar pointed out that WPI’s involvement with cryptography dates back to the beginning of the 20^th century when Gilbert Vernum, WPI class of 1914, invented the automated XOR stream cipher.

Opening remarks by Çetin Kaya Koç highlighted the increasing importance of and interest in the topics covered by CHES 2000. He presented three major reasons for this trend:

The explosive growth in popularity of portable information appliances such as pagers, phones, Personal Digital Assistants (PDA’s), etc.

The passing of Y2K, which was diverting IT resources away from dealing with issues such as Public Key Infrastructure (PKI).

The realization by industry that Digital Rights Management (DRM), especially for portable devices, is a major issue that needs to be addressed.

The remainder of this report are summaries of the sessions I attended during the two day workshop. Authors are listed in square brackets at the end of each summary. Invited talks were about 40 minutes, and presentations were 20 minutes in length. Your humble narrator apologizes in advance for any errors or omissions, and is open to any corrections or other useful feedback.

The first presentation was an invited talk by Alfred Menezes titled “Elliptic curve cryptography in constrained environments”. The overall focus of this talk was to demonstrate that significant performance improvements can be achieved by use of projective coordinates over affine coordinates, implementing algorithms for Koblitz curves is straightforward and point multiplication is much faster than for random curves. Cost estimates were presented for various point multiplication methods in terms of both elliptic curve and field operations. Empirical timing measurements were also presented for a selection of NIST elliptic curves over binary fields.

“Implementation of an elliptic curve cryptographic (ECC) coprocessor over GF(2^m)” presented results for an FPGA (EPF10K250AGC599-2, ALTERA) and also for simulations to evaluate an LSI implementation. The implementation was based on a new configuration for a multiplier over GF(2^m) that can operate for arbitrary irreducible polynomials at any bit length. This approach uses a new data conversion method in which data is converted, followed by a series of multiplications and then calculation of the final result is done by inverse conversion. For the FPGA, a 163 bit multiplication running at 3MHz it took 80ms on a pseudo-random curve, and 45ms on a Koblitz curve. Simulation results for a .25um ASIC with 165K gates running at 66MHZ were 1.1ms for pseudo-random curve and 0.65 seconds on a Koblitz curve. [Souichi Okada, Naoya Torii, Kouichi Itoh, & Masahiko Takenaka]

"A high-performance reconfigurable elliptic curve processor (ECP) for GF(2^m)" This talk introduced a new architecture that is targeted for programmable & reconfigurable hardware (e.g. FPGA’s) which enables the ability to deliver optimized solutions for different elliptic curves and finite fields. Claimed benefits of this approach include architecture efficiency, scalability, ease of reconfiguration as needed, and resource efficiency/reuse (e.g. using the same device for both public and private key algorithm execution). Three prototypes were implemented using a Xilinx XCV400E-8-BG432 FPGA. Performance of these ECP prototypes was compared against several leading hardware and software implementations. Results demonstrated that the fastest ECP prototype implementation for computing a point multiplication in the field GF(2¹⁶⁷) was 19 times faster than documented hardware implementations and 37 times faster than documented software implementations. [Gerardao Orlando and Christof Paar]

"Fast Implementation of elliptic curve defined over GF(p^m) on CalmRISC with MAC2424 coprocessor" This talk presented optimized finite field and elliptic curve algorithms which would allow cryptographic functions to be run on high performance devices where most instructions take only one cycle. For field multiplication and squaring, Karatsuba-Offman, row major method and column major method algorithms were considered. Column major was chosen to take advantage of MAC2424’s ability to multiply and accumulate in one cycle. For field inversion, an improved Bailey-Paar (BP) inversion was chosen over Inversion with Multiplication (IM). A mixed coordinates system via Lim-Hwang’s Modified Jacobian coordinates was used to help accelerate elliptic curve exponentiation. Measurements of implementations of the above algorithms demonstrated a 10% improvement over a single coordinate system. [Jae Wook Chung, Sang Gyoo Sim & Pil Joong Lee]

"Protecting smart cards from passive power analysis with detached power supplies” presented a new and inexpensive approach to preventing attacks based on monitoring and analysis of a smart card’s power consumption curve. The proposed solution is based on the use of two capacitors to provide power isolation from the outside world. These two capacitors along with four power transistors and a switch control circuit are used in a circuit which is configured such that the smart card chip is always powered by at least one capacitor, but the external power supply is never connected directly to the chip. It was pointed out that this approach does not deal with all passive power analysis issues (e.g. power leakage over the I/O line) or active attacks (e.g. physically remove or bypass the capacitors). It was stated that cost of this approach should only be a few cents per card. [Adi Shamir]

"Smartly analyzing the simplicty and power of simple analysis on Smartcards" discussed how simple power analysis (SPA) can be can be used to extract sensitive information using a single power consumption graph where care was taken to shield from noise. It was pointed out that the main advantage of SPA over differential power analysis is the small number of samples required and the minimal degree of device corruption. Although SPA requires knowledge of the executing code, that information can be extracted with relative ease by an experienced attacker/hacker using tools such as microprobes. Experiments performed by directly monitoring power dissipation of a PIC16C84 chip while executing a set of test routines demonstrated that it was possible to extract Hamming weights of data and transition counts between data items being written to registers or memory even with simple instructions such as register moves. [Rita Mayer-Sommer]

"Power analysis attacks and algorithmic approaches to their countermeasures for Koblitz curve cryptosystems" presented potential countermeasures for both simple power analysis (SPA) and differential power analysis (DPA) attacks on cryptosystems that use scalar multiplication on Koblitz curves. For SPA the countermeasures rely on making the power consumption for the elliptic curve scalar multiplication independent of the secret key. Countermeasures for DPA rely on randomizing the key before each execution of the scalar multiplication. Three DPA countermeasures were presented: key masking with localized operations, random rotation of key and random insertion of redundant symbols. The three DPA countermeasures presented could be used together to maximize resistance to attack. [M.A. Hasan]

"A timing attack against RSA with the Chinese Remainder Theorem" (CRT) presented a new kind of timing attack that enables factorization of an RSA modulus n if the exponentiation with the secret exponent uses the CRT while the multiplications and squarings modulo the prime factors p1 and p2 are done using Montgomery’s algorithm. While the standard variant of this attack assumes both exponentiations use a simple square and multiplication algorithm, a less efficient attack can be applied to more advanced exponentiation algorithms. The most straightforward countermeasure is to carry out an extra reduction within each multiplication. Use of blinding techniques provides a more general countermeasure to timing attacks. [Werner Schindler]

"A comparative study of performance of AES final candidates using FPGA’s" compared performance of MARS, RC6, Rijndael, Serpent, and Twofish running on FPGA’s (Xilinx Virtex). It also compared these measurements with software implementations and those carried out by the NSA on ASIC’s. Overall, results indicated that Rijndael and Serpent did best on FPGA’s. Rjndael did best in key setup latency, throughput and hardware utilization (throughput per area unit). Rinjdael provided the best time performance across all three platforms. [Andreas Dandalis, Viktor K. Prasanna, Jose D. P. Rolim]

"A dynamic FPGA implementation of the Serpent Block Cipher" described the implementation of Serpent (bitslice mode) in a Xilinx Virtex FPGA using Jbits (a Java-based configuration API for Virtex developed by Xilinx). Jbits allows several levels of abstraction for defining circuits and increases ease of integration for systems that are partitioned between hardware and software. This dynamic implementation was reported to achieve throughput of over 10 Gbits/sec. It was twice the speed and half the size of a static implementation and had lower power consumption and fewer package pins. [Cameron Patterson]

"A 12 Gbps DES Encryptor/Decryptor core in an FPGA" was presented as the fastest implementation to date. It was implemented in Xilinx XCV300 and XCV300E devices. Several different design optimizations were tried in order to get maximum performance. One that provided a significant boost was to apply understanding of the algorithm to force preferred mapping of the logic. Changing pipelining of the key also contributed to increased performance. The next step the authors plan to take is to reduce delay due to interconnect routing by doing manual placement and floor planning. [Steve Trimberger, Raymond Pang, Amit Singh]

"A 155 Mps triple-DES network encryptor" described a single chip running at 250 MHz that could concurrently encrypt and decrypt two 155 Mbps data streams using triple-DES in outer CBC mode. It was stated to be the first implementation to achieve this level of speed and cryptographic strength. A combination of full-custom and standard-cell designs along with a standard 0.6 um CMOS process were used. The chip was also tested in a real world environment by inserting it into a modified ATM Network Interface card. Future work will include expansion of CAM/RAM to allow more virtual circuits and addition of support for a microcontroller interface. [Herbert Leitold, Wolfgang Mayerwieser, Udo Payer, Karl Christian Posch, Reinhard Posch, Johannes Wolkerstorfer]

"An energy efficient reconfigurable public-key cryptography processor architecture" presented a possible approach to reducing the energy consumption of cryptographic functions in energy constrained environments such as handheld devices. This would be done by implementing a Domain Specific Reconfigurable Cryptographic Processor (DSRCP). A DSRP’s reconfigurability is limited to a domain of functions required for asymmetric cryptography. This reduces the overhead in terms of performance, energy consumption and reconfiguration time. Estimates that were presented showed a potential of 30 to 180 times greater energy efficiency over conventional FPGA’s. [James Goodman, Anantha Chandrakasan]

"High speed RSA hardware based on Barret’s Modular Reduction Method" presented the design considerations for a high-speed hardware accelerator for long integer modular exponentiation. It uses an optimized version (via hardware) of Barret’s modular reduction method. The chip uses a partial parallel multiplier to achiever a high degree of parallelism in the multiplier core and can decrypt at a rate of up to 2 Mb/sec when exploiting the Chinese Remainder Theorem. [Johann Grobschadl]

"Data integrity in hardware for modular arithmetic" addressed the issue of fault detection while performing arithmetic in hardware. Presented were practical methods for checking correctness of computations required for the RSA cryptosystem and Diffie-Hellman key exchange. One technique presented was a modular residue check that has a high probability of finding a random or intermittent arithmetic fault and can also find the majority of permanent faults (e.g. logic faults and fabrication errors). [Colin D. Walter]

"A design for modular exponentiation coprocessor in mobile telecommunication terminals" presented a design for a coprocessor suitable for implementing public key cryptography in a mobile telecom terminal. Three requirements were specified: concurrent double modular exponentiation at high speed, small form factor and low power consumption, and resistance to side channel attacks. right-to-left binary exponentiation algorithm was chosen and a novel circuit configuration and schedule control method for doing the double modular exponentiation calculations were presented. [Takehiko Kato, Satoru Ito, Jun Anzai, Natsume Matsuzaki]

"How to explain side channel leakage to your kids" was an invited talk by David Naccache that presented an entertaining overview of side-channel attack techniques at a level suitable for your senior managers (or your kids) who are not fluent in this area. It included timing attacks, power attacks, and fault generation. A copy of the slides are available from David Naccache. Also presented was a discussion about the application of deconvolution to DPA desynchronization countermeasures, for which there will be a paper published in the near future [David Naccache, Michael Tunstall]

"On Boolean and arithmetic masking against differential power analysis" presented an overview of differential power analysis and the types of countermeasures that have been proposed. The focus was on showing that the Boolean masking to arithmetic masking conversion algorithm proposed by Messerges as a countermeasure to DPA is potentially susceptible to DPA [Jean Sebastien Coron, Louis Goubin]

"Using second-order power analysis to attack DPA resistant software" used experiments performed on an ST16 smartcard to demonstrate that first and second order DPA attacks could be done. The author used a basic leakage model based on Hamming weight to show that a data-whitening routine could be subject to a successful 1^st order DPA attack. A hardened version of this routine was then shown to be susceptible to a 2^nd order DPA attack. [Thomas S. Messerges]

"Differential power analysis in the presence of hardware countermeasures" was a discussion of methods for overcoming hardware based DPA countermeasures, focusing on random process interrupts and noisy power consumption. One type of proposed attack involved applying a sliding window to the DPA attack described in Kocher’s “Differential Power Analysis”. A variant using Hamming integration was also presented. [Christophe Clavier, Jean-Sebastian Coron, Nora Dabbous]

At this point there were three talks on the subject of “Arithmetic Architectures”. Unfortunately, I was unable to attend, so the three talks presented are not included in this report. The web site listed above provides information on these presentations.

"Physical Security Devices for Computer Subsystems: A Survey of Attacks and Defense" presented a survey of physical attacks and defenses ranging from cheap and easy to extremely costly and complex. Physical attacks described included different types of probing (active, passive, injector, pico-probes, energy probes), machining away material, application of radiation and different types of electromagnetic energy, and even the use of shaped charges(!). Corresponding countermeasures for these attacks were also described including various types of physical barriers, ways to determine if tampering took place, and self destruct mechanisms. [Steve H. Weingart]

"Software-hardware tradeoffs: application to A5/1 cryptanalysis" discussed how a combined hardware/software approach to doing cryptanalysis can provide a more efficient implementation than hardware or software alone. An analysis of A5/1 was done with a combination of Xilinx FPGA’s on a Pamette card and software running on an Alpha XP-1000 workstation. Performance measurements were presented which demonstrated a significant gain when using the software/hardware combination. The authors state that this approach makes it possible to decrypt a GSM communication in a realistic interception scenario. [Thomas Pornin, Jacques Stern]

"MiniPASS: Authentication and Digital Signatures" in a constrained environment described an implementation of the PASS algorithm for execution on a smartcard. An overview of the PASS scheme was presented followed by a description of the implementation (MiniPASS) developed for this constrained environment. A sample implementation done in C on a workstation was presented to show the expected performance characteristics and size requirements. [Jeffrey Hoffstein, Joseph H. Silverman]

Last but not least, “Efficient generation of prime numbers” presented algebraic methods which substantially reduce the value of hidden constants, allowing a significant improvement in the efficiency of prime number generation. These methods were applied to DSA, strong, and ANSI X.9.31 primes. An almost 10 fold reduction in the number of rounds for Boneh and Franklin’s shared RSA keys protocol was demonstrated. There was also discussion of application of these methods to fast implementations of RSA on smartcards. [Marc Joye, Pascal Paillier, Serge Vaudenay]