Joe Marconis
InterTrust Technologies Corp.
Jmarconis@intertrust.com
CHES 2000 was held at Worcester
Polytechnic Institute (WPI) in Worcester, Massachusetts. It was the second
year for this event which provides a forum for combining theory and practice
for integrating strong data security into modern communications and e-commerce
applications. As the name of the workshop implies, the focus is on cryptographic
hardware and embedded systems design. There were almost 200 attendees for the
event, more that half coming from outside the USA. The workshop Chairs,
Çetin Kaya Koç and Christof Paar, did an excellent job of putting together an
interesting and successful workshop. CHES 2001 will be held in Paris,
France May13-16, 2001. For more information go to
www.cheswokshop.org
During his opening remarks Christof Paar pointed out that WPI’s involvement with
cryptography dates back to the beginning of the 20th
century when Gilbert Vernum, WPI class of 1914, invented the automated XOR
stream cipher.
Opening remarks by Çetin Kaya Koç highlighted the increasing importance of and
interest in the topics covered by CHES 2000. He presented three major reasons
for this trend:
The first presentation was an invited talk by Alfred Menezes titled
“Elliptic curve cryptography in constrained environments”.
The overall focus of this talk was to demonstrate that significant
performance improvements can be achieved by use of projective coordinates over
affine coordinates, implementing algorithms for Koblitz curves is
straightforward and point multiplication is much faster than for random curves.
Cost estimates were presented for various point multiplication methods in terms
of both elliptic curve and field operations. Empirical timing measurements were
also presented for a selection of NIST elliptic curves over binary fields.
“Implementation of an elliptic curve cryptographic (ECC) coprocessor over
GF(2m)” presented results for an FPGA (EPF10K250AGC599-2, ALTERA)
and also for simulations to evaluate an LSI implementation. The implementation
was based on a new configuration for a multiplier over GF(2m) that
can operate for arbitrary irreducible polynomials at any bit length.
This approach uses a new data conversion method in which data is converted,
followed by a series of multiplications and then calculation of
the final result is done by inverse conversion. For the FPGA, a 163 bit
multiplication running at 3MHz it took 80ms on a pseudo-random curve, and 45ms
on a Koblitz curve. Simulation results for a .25um ASIC with 165K gates running
at 66MHZ were 1.1ms for pseudo-random curve and 0.65 seconds on a Koblitz curve.
[Souichi Okada, Naoya Torii, Kouichi Itoh, & Masahiko Takenaka]
"A high-performance reconfigurable elliptic curve processor (ECP) for
GF(2m)" This talk introduced a new architecture that is targeted
for programmable & reconfigurable hardware (e.g. FPGA’s) which enables the
ability to deliver optimized solutions for different elliptic curves and finite
fields. Claimed benefits of this approach include architecture efficiency,
scalability, ease of reconfiguration as needed, and resource efficiency/reuse
(e.g. using the same device for both public and private key algorithm
execution). Three prototypes were implemented using a Xilinx XCV400E-8-BG432
FPGA. Performance of these ECP prototypes was compared against several leading
hardware and software implementations. Results
demonstrated that the fastest ECP prototype implementation for computing a point
multiplication in the field GF(2167) was 19 times faster than
documented hardware implementations and 37 times faster than documented software
implementations. [Gerardao Orlando and Christof Paar]
"Fast Implementation of elliptic curve defined over GF(pm) on
CalmRISC with MAC2424 coprocessor" This talk presented optimized finite
field and elliptic curve algorithms which would allow cryptographic functions
to be run on high performance devices where most instructions take only one cycle.
For field multiplication and squaring, Karatsuba-Offman, row major method and
column major method algorithms were considered. Column major was chosen to take
advantage of MAC2424’s ability to multiply and accumulate in one cycle.
For field inversion, an improved Bailey-Paar (BP) inversion was chosen over
Inversion with Multiplication (IM). A mixed coordinates system via Lim-Hwang’s
Modified Jacobian coordinates was used to help accelerate elliptic curve
exponentiation. Measurements of implementations of the above algorithms
demonstrated a 10% improvement over a single coordinate system. [Jae Wook Chung,
Sang Gyoo Sim & Pil Joong Lee]
"Protecting smart cards from passive power analysis with detached power
supplies” presented a new and inexpensive approach to preventing attacks
based on monitoring and analysis of a smart card’s power consumption curve.
The proposed solution is based on the use of two capacitors to provide
power isolation from the outside world. These two capacitors along with four
power transistors and a switch control circuit are used in a circuit which is
configured such that the smart card chip is always powered by at least one
capacitor, but the external power supply is never connected directly to the
chip. It was pointed out that this approach does not deal with all passive
power analysis issues (e.g. power leakage over the I/O line) or active attacks
(e.g. physically remove or bypass the capacitors). It was stated that cost of
this approach should only be a few cents per card. [Adi Shamir]
"Smartly analyzing the simplicty and power of simple analysis on
Smartcards" discussed
how simple power analysis (SPA) can be can be used to extract sensitive
information using a single power consumption graph where care was taken to
shield from noise. It was pointed
out that the main advantage of SPA over differential power analysis is the small
number of samples required and the minimal degree of device corruption.
Although SPA requires knowledge of the executing code, that information
can be extracted with relative ease by an experienced attacker/hacker using
tools such as microprobes. Experiments
performed by directly monitoring power dissipation of a PIC16C84 chip while
executing a set of test routines demonstrated that it was possible to extract
Hamming weights of data and transition counts between data items being written
to registers or memory even with simple instructions such as register moves.
[Rita Mayer-Sommer]
"Power analysis attacks and algorithmic approaches to their countermeasures
for Koblitz curve cryptosystems" presented potential countermeasures for
both simple power analysis (SPA) and differential power analysis (DPA) attacks
on cryptosystems that use scalar multiplication on Koblitz curves. For SPA
the countermeasures rely on making the power consumption for the elliptic
curve scalar multiplication independent of the secret key. Countermeasures
for DPA rely on randomizing the key before each execution of the scalar
multiplication. Three DPA countermeasures were presented: key masking with
localized operations, random rotation of key and random insertion of redundant
symbols. The three DPA countermeasures presented could be used together to
maximize resistance to attack. [M.A. Hasan]
"A timing attack against RSA with the Chinese Remainder Theorem"
(CRT) presented a new kind of timing attack that
enables factorization of an RSA modulus n if the exponentiation with the secret
exponent uses the CRT while the multiplications and squarings modulo the prime
factors p1 and p2
are done using Montgomery’s algorithm. While the standard variant of this
attack assumes both exponentiations use a simple square and multiplication
algorithm, a less efficient attack can be applied to more advanced
exponentiation algorithms. The most
straightforward countermeasure is to carry out an extra reduction within each
multiplication. Use of blinding
techniques provides a more general countermeasure to timing attacks. [Werner
Schindler]
"A comparative study of performance of AES final candidates using FPGA’s"
compared performance of MARS, RC6, Rijndael, Serpent, and Twofish running on
FPGA’s (Xilinx Virtex). It also compared these measurements with software
implementations and those carried out by the NSA on ASIC’s. Overall,
results indicated that Rijndael and Serpent did best on FPGA’s. Rjndael did
best in key setup latency, throughput and hardware utilization (throughput per
area unit). Rinjdael provided the best time performance across all three
platforms. [Andreas Dandalis, Viktor K. Prasanna, Jose D. P. Rolim]
"A dynamic FPGA implementation of the Serpent Block Cipher"
described the implementation of Serpent (bitslice
mode) in a Xilinx Virtex FPGA using Jbits (a Java-based configuration API for
Virtex developed by Xilinx). Jbits
allows several levels of abstraction for defining circuits and increases ease of
integration for systems that are partitioned between hardware and software.
This dynamic implementation was reported to achieve throughput of over 10
Gbits/sec. It was twice the speed
and half the size of a static implementation and had lower power consumption and
fewer package pins. [Cameron Patterson]
"A 12 Gbps DES Encryptor/Decryptor core in an FPGA" was presented as
the fastest implementation to date. It was implemented in Xilinx XCV300 and
XCV300E devices. Several different design optimizations were tried in order
to get maximum performance. One that provided a significant boost was to
apply understanding of the algorithm to force preferred mapping of the logic.
Changing pipelining of the key also contributed to increased performance.
The next step the authors plan to take is to reduce delay due to interconnect
routing by doing manual placement and floor planning. [Steve Trimberger,
Raymond Pang, Amit Singh]
"A 155 Mps triple-DES network encryptor" described a single chip running
at 250 MHz that could concurrently encrypt and decrypt two 155 Mbps data streams
using triple-DES in outer CBC mode. It was stated to be the first
implementation to achieve this level of speed and cryptographic strength.
A combination of full-custom and standard-cell designs along with a
standard 0.6 um CMOS process were used. The chip was also tested in a real world
environment by inserting it into a modified ATM Network Interface card.
Future work will include expansion of CAM/RAM to allow more virtual
circuits and addition of support for a microcontroller interface. [Herbert
Leitold, Wolfgang Mayerwieser, Udo Payer, Karl Christian Posch, Reinhard Posch,
Johannes Wolkerstorfer]
"An energy efficient reconfigurable public-key cryptography processor
architecture"
presented a possible approach to reducing the energy consumption of
cryptographic functions in energy constrained environments such as handheld
devices. This would be done by
implementing a Domain Specific Reconfigurable Cryptographic Processor (DSRCP).
A DSRP’s reconfigurability is limited to a domain of functions required
for asymmetric cryptography. This
reduces the overhead in terms of performance, energy consumption and reconfiguration
time. Estimates that were presented showed a potential of 30 to
180 times greater energy efficiency over conventional FPGA’s. [James Goodman,
Anantha Chandrakasan]
"High speed RSA hardware based on Barret’s Modular Reduction Method"
presented the design considerations for a high-speed hardware accelerator for
long integer modular exponentiation. It uses an optimized version (via hardware)
of Barret’s modular reduction method. The chip uses a partial parallel
multiplier to achiever a high degree of parallelism in the multiplier core
and can decrypt at a rate of up to 2 Mb/sec when exploiting the Chinese
Remainder Theorem. [Johann Grobschadl]
"Data integrity in hardware for modular arithmetic" addressed the issue
of fault detection while performing arithmetic in hardware.
Presented were practical methods for checking correctness of computations
required for the RSA cryptosystem and Diffie-Hellman key exchange.
One technique presented was a modular residue check that has a high
probability of finding a random or intermittent arithmetic fault and can also
find the majority of permanent faults (e.g. logic faults and fabrication
errors). [Colin D. Walter]
"A design for modular exponentiation coprocessor in mobile telecommunication
terminals" presented a design for a coprocessor suitable for implementing
public key cryptography in a mobile telecom terminal. Three requirements were
specified: concurrent double modular exponentiation at high speed, small form
factor and low power consumption, and resistance to side channel attacks.
right-to-left binary exponentiation algorithm was chosen and a novel circuit
configuration and schedule control method for doing the double modular
exponentiation calculations were presented. [Takehiko Kato, Satoru Ito, Jun
Anzai, Natsume Matsuzaki]
"How to explain side channel leakage to your kids" was an invited talk
by David Naccache that presented an entertaining overview of side-channel attack
techniques at a level suitable for your senior
managers (or your kids) who are not fluent in this area.
It included timing attacks, power attacks, and fault generation.
A copy of the slides are available from David Naccache. Also presented
was a discussion about the application of deconvolution to DPA desynchronization
countermeasures, for which there will be a paper
published in the near future [David Naccache, Michael Tunstall]
"On Boolean and arithmetic masking against differential power analysis"
presented an overview of differential power analysis and the types of
countermeasures that have been proposed. The focus was on showing that the Boolean
masking to arithmetic masking conversion algorithm proposed by Messerges
as a countermeasure to DPA is potentially susceptible to DPA
[Jean Sebastien Coron, Louis Goubin]
"Using second-order power analysis to attack DPA resistant software"
used experiments performed on an ST16 smartcard to demonstrate that first and
second order DPA attacks could be done. The author used a basic leakage model
based on Hamming weight to show that a data-whitening routine could be subject
to a successful 1st order DPA attack. A hardened version of this
routine was then shown to be susceptible to a 2nd order DPA attack.
[Thomas S. Messerges]
"Differential power analysis in the presence of hardware countermeasures"
was a discussion of methods for overcoming hardware based DPA countermeasures,
focusing on random process interrupts and noisy power consumption.
One type of proposed attack involved applying a sliding window to the DPA attack
described in Kocher’s “Differential Power Analysis”. A variant using
Hamming integration was also presented. [Christophe Clavier, Jean-Sebastian Coron,
Nora Dabbous]
At this point there were three talks on the subject of “Arithmetic Architectures”.
Unfortunately, I was unable to attend, so the three talks presented are
not included in this report. The web site listed above provides information
on these presentations.
"Physical Security Devices for Computer Subsystems: A Survey of Attacks and
Defense" presented a survey of physical
attacks and defenses ranging from cheap and easy to extremely costly and complex.
Physical attacks described included different types of probing (active, passive,
injector, pico-probes, energy probes), machining away material, application of
radiation and different types of electromagnetic energy, and even the use of
shaped charges(!). Corresponding countermeasures for these attacks were also
described including various types of physical barriers, ways to determine if
tampering took place, and self destruct mechanisms. [Steve H. Weingart]
"Software-hardware tradeoffs: application to A5/1 cryptanalysis"
discussed how a combined hardware/software approach to doing cryptanalysis
can provide a more efficient implementation than hardware or software alone.
An analysis of A5/1 was done with a combination of Xilinx FPGA’s on a Pamette
card and software running on an Alpha XP-1000 workstation.
Performance measurements were presented which demonstrated a
significant gain when using the software/hardware combination.
The authors state that this approach makes it possible to decrypt a GSM
communication in a realistic interception scenario. [Thomas Pornin, Jacques
Stern]
"MiniPASS: Authentication and Digital Signatures"
in a constrained environment described an implementation of the PASS
algorithm for execution on a smartcard. An
overview of the PASS scheme was presented followed by a description of the
implementation (MiniPASS) developed for this constrained environment.
A sample implementation done in C on a workstation was presented to show
the expected performance characteristics and size requirements. [Jeffrey
Hoffstein, Joseph H. Silverman]
Last but not least, “Efficient generation of prime numbers” presented
algebraic methods which substantially reduce the value of hidden constants,
allowing a significant improvement in the efficiency of prime number generation.
These methods were applied to DSA, strong, and ANSI X.9.31 primes. An almost
10 fold reduction in the number of rounds for Boneh and Franklin’s shared RSA keys
protocol was demonstrated. There was also discussion of application of these
methods to fast implementations of RSA on smartcards. [Marc Joye, Pascal Paillier,
Serge Vaudenay]
The remainder of this report are summaries of the sessions I attended during
the two day workshop. Authors are listed in square brackets at the end of
each summary. Invited talks were about 40 minutes, and presentations were
20 minutes in length. Your humble narrator apologizes in advance for any
errors or omissions, and is open to any corrections or other useful feedback.