The 2001 IEEE Symposium on Security and Privacy
Oakland, CA, USA, May 13-16, 2001
Review by Mary Ellen Zurko
July 10, 2001
The 2001 IEEE Symposium on Security and Privacy (www.ieee-security.org/TC/sp2001.html) was held May 13-16, 2001 at The Claremont Resort in Oakland, California, USA.
Program Co-Chair, Roger Needham, gave some background on the program. There were 113 submissions. They received 3 reviews each, assigned on a bidding basis. PC members said what they would like to review, and Program Co-Chair, Martin Abadi, did the assignments. Roger sees the anonymity of authors as a privilege rather than a duty. Authors can sacrifice it to make the reviewing job easier (particularly in terms of references).
The first session, Tamper-resistance and Cryptography, was chaired by Martin Abadi (Intertrust).
The first paper was "Cryptographic Security for Mobile Code" by Joy
Algesheimer, Christian Cachin, Jan Camenisch, and Gunter Karjoth (IBM Research).
Christian presented.
Their work is motivated by the need to protect mobile
code and the originator from attacks. Their privacy example is an auction or
negotiation between a vendor and a client with a browser. In that case, both
parties need to know the outcome of the transaction. Their model includes a
mobile agent starts at an originator and goes to multiple hosts via a path that
is not preselected. Each participant sends and receives a single message.
Computation is with encrypted functions over encrypted data. Existing work on
homomorphic public-key encryption schemes and protocols that use them do not
address the need to provide output information to the host. There are also
trusted hardware solutions, which the authors consider expensive, not very
flexible, and requiring trust in all parties involved. This work builds on Yao's
encrypted circuit method. In addition to the construction and evaluation of the
circuit, that work includes an oblivious transfer guarantee, where the receiver
gets information about one of two messages and the sender does not know which
message the receiver got. There is still a need for an output for the host.
Their proposed solution is to extend this work with a generic Secure Computation
Service, which has none of the stated drawbacks of the trusted hardware
solution. This partially trusted service provider T does not collude with any
hosts Hi or the originator O. It gains no information for itself, and actively
provides a service for confidentiality. Similar service include anonymizers,
CAs, and trust audit agencies. The realization involves one round of interaction
with T. O constructs an encrypted circuit for C computing the functions over the
inputs needed by host H and O and the separately encrypted input tokens that H
must choose between. H sends its choice to T. T decrypts the appropriate set of
tokens and returns the cleartext to H. H uses the cleartext to evaluate the
encrypted circuit. The circuits/computations must have unique ids, so that T
never decrypts more than one set of tokens for each id. Adding robustness is
possible. Practically, this only works for small parts of a program, such as a
composition function or negotiation/auction strategy. As a concrete example, if
the circuit is realized with AES with 128-bit keys, computing the maximum of two
n-bit numbers will need less that n kilobytes of storage and about 100n AES
operations. Their conclusions are that mobile code security is not possible
without assuming trust in a 3rd party, it is possible using a generic on line
service T, that T learns nothing about the computation, and that it could be
practical for small functions.
One question was how to get some assurance
that T is trustworthy. Christian suggested you may perform some computations or
check some subset or sample. The questioner instead wanted assurance that T is
not leaking information. Christian does not know if that can be solved. Another
question brought out that authenticating T is built into the model, with the
assumption that everyone has to have a public key (and a connected trust
infrastructure). Another attendee asked how their work exploits the fact that it
is mobile code (rather than a multi party computation). The answer was that the
model for communication is different. Everyone has their own security
considerations and they are linked by a single outgoing and incoming message
each.
The next paper was "Networked Cryptographic Devices Resilient to
Capture" by Philip MackKenzie, Michael Reiter (Bell Labs, Lucent). Mike
presented.
While cryptographic algorithms and protocols are standardized and
strong, the end points which may hold private keys are still subject to viruses,
theft and reverse engineering. They are in some sense are laughable and the weak
link. In a 2001 CSI/FBI survey, the top two causes of quantifiable losses were
due to viruses (35%) and laptop theft (27%). Private data is usually encrypted
under a password but passwords are notoriously easy to break. For example, a
study by Klein found 25% of the targeted passwords with a dictionary search. The
authors are interested in a software only approach to this problem. The device
cannot sign without testing the password at the server, requiring online
dictionary attacks. The server is untrusted. an attacker who captures the server
itself and a user's password cannot impersonate the device. Like the previous
paper, this is a service provider model. For an attacker that captures the
device, the goals is that the best they can do send password guesses to the
server. If a strong attacker captures both, it can forge a signature only if it
succeeds in an offline dictionary attack against the password. Device
initialization does not involve the server. The client generates a token which
is encrypted with the server's public key, which when decrypted allows the
client to extract its private key if it has the right password. The token also
includes some random numbers that are saved on the device. The private key and
password necessary to create this initial token are then discarded. The device
generates an authenticator based on the input password, encrypts it, and sends
it to the server. If the password check passes on the server side, it sends back
information that allows the device to recover the private key. There is some
protection against DOS in the protocol (not presented). The system can be
extended to support key disabling, even from someone who steals the device and
knows the password. This provides defense against the disgruntled employee
scenario. It is not possible with the generic protocol, since the device
extracts the private key and it can be stored for future use. This is
complementary to public key revocation, which requires the checking of a CRL.
Using function sharing to share the function of encryption and signing between
the device and server, the key can be mathematically disabled. The server's
share of the computation is returned with the device's signing key. Mike showed
examples with RSA signatures and referenced the ElGamal decryption example in
the paper. There is an error in the proceedings; the authors have issued a tech
report through dimacs.rutgers.edu/ with an update.
The work includes a non-interactive zero knowledge proof that the server
computed its share correctly. Otherwise, the server could change the encryption
of the message in an undetected way. The work does not roll over to DSA
directly. Their two party DSA signature generating protocol will appear in
CRYPTO. It is more computationally intensive that the work described here. There
are simulation proofs showing the protocol is provably secure. The summary
stated that the work achieves quantifiable (and very high) security using
standard passwords or even PINs. It can achieve better security than was
previously possible, but it needs network connectivity to perform cryptographic
operations.
Questions challenged the efficacy of this approach against
viruses, since they can look at a great deal of residual information, including
what got typed in and the private key that is returned. The class of viruses
this works against is the sort that sucks the on-disk information off the
machine and sends it somewhere. Residual information protection is outside the
threat model. Someone asked if the server is completely untrusted, why bother to
have public keys, since the server must be trusted for revocation. The sense
that the server is untrusted is that if you take the status quo and involve the
server you get key disabling. So if the server does not honor the revocation,
you don't lose anything you had before use of the system. Another attendee said
that since fixed passwords are bad idea in the first place, isn't this
overendowing them by building a straw herring and shooting it in the foot (I am
not making this up). Mike's response is that fixed passwords are still around,
and with this you get dramatically greater security. They've thought about
augmenting the system with one time passwords. Some cases can be done, and some
are awkward to do. An attendee pointed out the tradeoff is against a new DOS;
you can generate signatures that no one can verify.
The next paper was
"Protection of Keys Against Modification Attack" by Wai-wa Fung, Mordecai Golin,
Jim Gray (Hong Kong UST). Jim presented (he has been at RSA for about 3 years).
He pointed out that it's really Wai-wa's work. The key is recorded in the
EEPROM of a smartcard. The EEPROM cannot (easily) be read but it can be easily
modified. An attacker can also operate the smartcard. This work is inspired by
the Anderson and Kuhn attacks. Microprobing needles can write individual bits in
the EEPROM and are cheap. The attacker can try to run the card and see what
happens. An attack for DES relies on parity bit checks. Setting a particular bit
to 1 succeeds if it was previously 1, and causes the parity check to fail if it
was 0. This attack is generalized by running the device once to obtain the
correct output, then making modifications and comparing the new output with the
correct one. The modification attack exploits the fact that the key is stored
bare in the device. Perfect protection is impossible. The goal is that it is
hard for clever outsiders, with not much money, but lots of time and
intelligence, to get the key. They considered hiding the key in a random
location, which transfers the problem to finding the address of the key. The
cards must be mass produced, and the location would be stored in the EEPROM.
This work separates the physical representation of the key P from the actual
cryptographic key K. P is stored in EEPROM. A decoding function maps P to K. A
possible protection scheme introduces redundancy, so there is more than one
possible P for each K. P is decoded with a majority vote. It can still be broken
by a modification attack in O(length P) time. For any protection scheme if the
wiring function W and the decoding function D are known and deterministic, then
an attacker can break it in O(length of P) probes. Hidden wiring and secret
decoding functions and randomized decoding functions were not considered. The
first idea was that P is a permutation of K and the wiring inverts the
permutation. This can be broken in O(n) probes where n is length of K. They then
considered cascading M permutations. When M = 2, the wiring undoes both
permutations. All decoding functions must receive matching inputs before a key
is output. This is attacked by getting P, then using a brute force method to get
the wiring in O(n squared). If a protection scheme uses M different cascaded
permutations, the brute force technique requires only O(n to the m) time for the
attack (polynomial time complexity). A single card, instead of the whole batch,
is needed to be compromised. Cards are normally manufactured in batches. The
encoding needs to hide the bit occurrences of 0's and 1's in K. It blinds the
key before using it. Blinding conceals the number of 1's and 0's in the key.
Temporary keys are generated at programming time. There are the same number of
extra keys as permutations. After reversing the permutations, it XORs all the
physical keys together, so they get cancelled out. For more redundancy, P can be
lengthened. You want to have a roughly even number of 1's and 0's in the blinded
keys. An analysis with a range of numbers of 1's and 0's is in the paper.
Questioning brought up that the scenario is that there are small batches for
a given wiring (about 2 thousand). Someone asked if they compared the strength
with an error correction code to represent P, so that a certain number of bits
is needed to change P. They hadn't.
The session called Intrusion and Anomaly Detection, I was chaired by Marc Darcier (IBM Zurich Research Center).
The paper "Data Mining Methods for Detection of New Malicious
Executables" by Matthew Schultz, Eleazar Eskin, Erez Zadok, and Sal Stolfo
(Columbia University) was presented by Eleazar.
Eight to ten new malicious
executables are created every day and traditional detection methods are
vulnerable. Their idea is to use data mining to detect new malicious
executables. Traditional signature based methods can not generalize to new
executables. Virus scanners with heuristic components are hand crafted by
experts, so they are expensive, and their rules are not generalizable to other
platforms. The data mining approach finds patterns in benign and malicious
executables and uses these patterns to classify new binaries. The advantages are
it is automatic and generalizes to new executables. The model requires less
frequent updates. It produces a probabilistic output. The prediction algorithm
extracts features from the suspect binary and computes the probability of it
being malicious given its features and a model of malicious and benign
executables. The modeling algorithm extracts features from binaries labeled
malicious or benign and computes statistics over those features. Given a set of
features F and a class C, the goal is to compute P(C|F). They use Bayes Rule,
since it's a difficult quantity to compute directly. It's easiest to compute the
probability of the class itself P(C). They use the naive Bayes independence
assumption that each feature occurs independently. This is not realistic but it
works in practice. They compute the most likely class and output whichever
probability is larger. Each feature contributes to the final outcome.
Discriminative features are present in one of the classes and have a large
impact on classification. The data mining features include OS specific features.
For Win32, this includes DLL and resource information obtained from the
executable headers using objdump.exe. The OS independent features they use
include the strings extracted directly from the binary. Their experimental data
set consisted of 3265 malicious programs and 1001 benign ones. 244 of these were
in Portable Executable (PE) format. They did a 5-fold cross validation to
simulate new executables. 4/5ths of the data was used to generate the model
(training) and 1/5th was used to test the model (test set). This was repeated 5
times. Their baseline was a signature scanner. It extracted all signatures from
malicious executables and chose a signature for each malicious executable that
did not occur in the benign executables. It labeled each executable malicious if
it contained a malicious signature. The approach was based on a standard
anti-virus signature approach. The detection rate for the signature based
approach was 33.96%, for their Bayesian approach it as 96%. There were 0 false
positives with the signature approach vs. 6.01% with The Bayesian. They
calculate that tuning the algorithm to a 1% false positive rate would produce a
75% detection rate, while a 98% detection rate would produce a 12% false
positive rate. The results are only preliminary; this is a smaller data set than
the 50K known existing malicious programs. The data set is all stand alone
programs; it does not include infected programs, and it is not realistic data
that is found in the wild. They're looking for help finding a larger data set.
They also hope to work on more efficient algorithms.
One attendee
conjectured that the false alarm rate was strongly influenced by the data.
Someone asked what the difference between pattern recognition and data mining
is. The response is that it's a philosophical question. The presenter's is on
machine learning, but it's called data mining in security. Someone suggested
that trying to implicitly extract behaviors is problematic. The rule sets
created are human readable, but they don't make sense (they're not something
you'd think of). They just perform statistically well. Someone pointed out that
there could be problems with a poor selection of features, or a class of
programs that don't do anything malicious only because our OSes don't force you
to do something malicious. The response was that this method can find slightly
different clones inspired by something like the IL*VEYOU virus. A question
brought out that they are pruning feature set if the number is close for both
classes. Another question brought out that they are looking at thousands and
thousands of features; string extraction in particular produces a lot. There was
some discussion about labeling broad sub classes of viruses. Someone pointed out
that a moderately smart virus could behave normally for a while with an embedded
encrypted program, then attack after running a number of times.
The next
paper was "Evaluation of Intrusion Detectors: A Decision Theory Approach" by
John Gaffnew (Lockheed Martin) and Jacob Ulvila (Decision Science Associates,
Inc). Jacob presented.
They present a comprehensive way to think about the
evaluation of Intrusion Detection Systems (IDSs): the expected cost metric. It
can also be used to compare IDSs with each other, against having no detector at
all, and against a detection goal. It can also choose the best operating point
for an IDS when the system is tunable, and be used to adjust the operation of an
IDS in response to a changing environment. Some methods used or proposed for
evaluating IDSs include comparing the curves of Receiver Operation
Characteristics (detection probability vs. false alarm rate), setting a
particular goal for probability of detection and false alarm rate, cost based
modeling, and false alarm rate. Metrics proposed include the area under the ROC
curve, total cost from a simulation, the threshold false alarm rate where
response becomes "unmanageable", and "distance" from a goal or threshold. Their
decision analysis approach is to combine ROC and cost based methods. This
extends to consider the hostility of the environment. This produces an expected
cost metric. It is described in the simplest case needed to make their points in
the paper, but it can be extended to more complicated situations. They use a
decision tree for calculating the expected cost of an operating point on an
IDS's ROC curve. Costs include failure to respond when there is an intrusion and
cost of responding when there is no intrusion. The detector provides the
probability of an alarm given an intrusion. The system needs the probability of
no intrusion given an alarm, the probability of an intrusion given an alarm and
the probability of an alarm. They are calculated under Bayes' theorem.
Techniques from Bayesian statistics, mathematical psychology, decision analysis
need to be used to compensate for base rate fallacy; individuals fail to account
for the base rate when estimating a probability. They estimated based on the
data available and guidelines. The costs also need to be estimated, including
damage costs, and monetary and other outage costs. Then determine the optimal
operating point that minimizes expected cost. You can find the optimal points
for each detector; then get the difference in expected costs. They produced two
estimated ROC curves to fit the DARPA data. With costs and probabilities
applied, either could be preferred to the original study's goal point. Their
conclusions were that meeting a goal on the probabilities of detection and false
alarms is inadequate, the area under the ROC curve is not a good metric,
connecting discrete ROC points with straight lines is pointless, and the lowest
expected cost will always be at one of the points. Expected cost is a superior
method for evaluating IDSs and should be used in actual evaluations. The method
should be included in IDS research and development projects.
Someone asked
if the model dealt with the cost of responding to alarms. Expected cost vs. no
detector could be looked at. They don't currently have a lot of detail on that.
An attendee pointed out that they make evaluation even more complex. Will we
ever get something usable? It is subject to our ability to make estimates. Maybe
you can just look at the false alarms because if that's too high, no one will
respond. The severity of some attacks could put you out of business. Questioning
brought out that and attacker might generate a lot of false alarms to shut down
the system or doscredot an IDS. Another attendee pointed out that about 15 years
ago in a large circulation IT magazine, someone indicated you could target the
cryptographic checksums on messages. Systematically cause a link to randomly
fail for a month or two. The people responding would let bogus transfers go
through in order to get their normal job done.
The session on Information Flow was chaired by Virgil Gligor (U Maryland).
"On
Confidentiality and Algorithms" by Johan Agat and David Sands (Chalmers
University of Technology) was presented by David.
Mobile code provides a new
motivating example for continuing research into whether code keeps its secrets.
Programs may help you find a doctor, get medical information, and store your
medical records. The current focus is on techniques such as static analysis and
proof carrying code. NonInterference (NI) specifies that the observable behavior
an attacker sees is not influenced by high secret data. Information can be
relayed via timing and delays. Can useful secure programs be written? The "folk
law" is that NI is too strong in practice and that "timing channels are
impossible". We need to understand more to figure out how to best loosen the
strict information flow conditions to achieve practical programs. Timing issues
are real and shouldn't be ignored. Are there efficient algorithms which satisfy
the conditions of secure information flow? This paper presents a study of
algorithms for sorting and searching collections of secret data. The timing
behavior includes caching. Recently referenced items reside in the cache, and
fetching from cache much cheaper than from main memory. It is easy to construct
a covert timing channel from cache behavior. For example, a variable may or may
not be cached, depending on the branch previously taken. Structure size cannot
be kept secret; indexing out of range can leak the size and running time always
depends on structure size. One approach is to make all operations take the same
time as the worst case operation. To provide deterministic search structures,
O(log n) structures (AVL trees, 2-3 trees) can be modified according to the
worst case principle. Fixed hashing won't work. It has a non-constant factor
worst case since collisions take asymptotically more time. If keys are stored
via pointers there are (additional) cache-leak problems. If the algorithm
inserts items in a different order depending on a branch and if they're pointers
into main memory, the attacker makes them collide in the cache, so that the
system can only cache one or the other. This forces an information leak.
Standard heapsort is easily adapted using the worst case principle; the
difference in time is quite small. Adapting Batcher's odd-even merging we get
O(n log squared n) time, O(n) space and a completely static comparison order
independent of the data in the structure. Randomized algorithms can provide
probabilistic noninterference. Randomized quicksort chooses the pivot randomly
so that all splits are equally likely. The expected complexity is O(n log n)
with any input and the distribution of running times is not dependent on the
elements. Future work includes other important classes of algorithms, formal
verification/program logics, and assuming pre-certified library components.
A questioner was concerned that this requires the cooperation of application
programs which may be hard to get without control of the source code. Did they
consider the fuzzy time work? This work's perspective is compositional. They
want to verify the components then allow program to use these freely. The
attacker can arbitrarily magnify the timing differences. A little bit of noise
doesn't solve the problem. Another attendee pointed out that the NI method by
randomized hashing is theoretically insecure. Is this a limitation of the model?
Yes, they assume that the hacker has limited computation power.
The next
paper was "Preserving Information Flow Properties under Refinement" by Heiko
Mantel (German Research Centre for Artificial Intelligence, DFKI).
Stepwise
development is crucial for complex systems. This involves
composition/decomposition and refinement. You start with an abstract
specification of what the system does at some level. It satisfies certain
properties that are specified declaratively, and they are proven. From the
abstract you refine to a more concrete specification. Look again at the
property; does the concrete specification satisfy it? System specifications use
an event based model. Traces are sequences of events. An event is an atomic
action, like the transmission of a message. Subset refinement removes
non-determinism. Information flow properties are properties of single traces.
Safety and liveness properties are properties of sets of traces. An alternative
to verifying the closure property on a set of traces is to verify the unwinding
conditions. The unwinding theorem states that if there is an unwinding relation
such that the unwinding conditions are satisfied then the information flow
properly holds. The example in the talk involved
three events, two of which
are visible. Each can occur at most once, in any order. Previous suggestions
include restricting the information flow properties. This is not fully
compatible with non-determinism. Systems may be rejected as insecure because
they are non-deterministic , even if they are secure. No satisfactory property
has been found so far. Another approach is to repair after refinement. A problem
with that is that information flow properties need to be reinvestigated after
every development step. The specification must be repaired in terms of complete
traces, but the repair might not terminate at a single level. This work proposes
to restrict refinement such that it preserves the unwinding conditions. What is
new is that this system exploits the proof of the information flow properties at
the abstract level. It provides two refinement operators that take the abstract
specification and event pairs to be disabled. For pairs involving a high level
event, that subset can be disabled. For the low subs
et there are two
possibilities: disable additional pairs not in the set or disable at most what
is in the set. The refinement operators return a concrete specification.
Refinement operations for the perfect security property, forward correctability,
and so on are not restricted to the two level flow policy. The calculation of
refinement terminates. Future work includes other notions of refinement such as
data refinement and possibly action refinement, how to ensure the minmality of
the unwinding relation and how to preserve it (important for optimality), case
studies such as mobile computing, the bigger picture for information flow
properties being refined from abstract specifications to concrete code, and work
on programming languages.
Questions focused on whether or not bi-simulation
(?) was better. Since I [Mez] don't know what it is, I was unable to capture
them in a useful form. Answers indicated that the author had not explored that
avenue, and that his approach does not provide an easy to understand abstract
property.
The session on Access control and trust management was chaired by Paul Karger (IBM).
"Understanding Trust Management Systems" was
presented by its author, Stephen Weeks (InterTrust Technologies).
The paper
is based on the research he did over the last few years to help with a decision
on which trust management (TM) to use. The mathematical model lets him pigeon
hole and talk about them. In his definition of trust management systems, a trust
management engine (TME) mediates access. The authorization inputs are
(principal, object, access) tuples. The access matrix is not a centralized
object. It is carried in lots of different certificates issued to principals in
the system. A different access matrix is conveyed by each certificate (cert).
The TME combines certs and decides whether to allow the access to the object by
the principal specified. Certificates may delegate from one principal's access
matrix to another. The TME takes a principal (who authorizes), an authorization
(what want to do), and the set of certs. It checks what that principal
authorizes. The model needs to cover authorizations and delegation. The access
matrices form a lattice. You can take any two and join them in a natural order.
The TME combines authorizations with a join. Examples of authorization lattice
include PKIX, Mass and Herzberg's 2000 role based paper, the SPKI name
component, and Keynote for check writing. The model for certs and delegation
starts with principal P authorizing action a. Delegation is that P authorizes a
if p' authorizes a (or a'). You can delegate to multiple principles (and). Then
a cert is modeled as an issuing principal and function, such that if principal p
issues function f and M is an element of what is authorized elsewhere then p
authorizes f(M). The function is monotonic. If you present a set of certs that
prove you are authorized and it says no, you shouldn't be able to present a
smaller set and get yes. All certs are monotone (they cannot be revoked). You
can model what goes on in TME as finding a fixpoint. In an initial map, all
principals authorize nothing. Compute a new map by looking at all the certs for
a particular principal and apply them to the previous map. Join them in the
lattice of authorizations and shows what's been granted. You get to a fixpoint
when there is no change from one point to another. The paper presents the
semantics of SPKI and Keynote. You can improve an existing system by extending
the cert language to express more monotone functions. You can compare existing
systems within the same framework and study concepts applicable to many TM
systems. The model can also be used to design new systems.
An attendee asked
if this rules out certs that restrict access. Yes, they must be monotone. An
attendee noted that it cannot model, for example, the Brewer Nash Chines Wall
model which is used in investment banking, where you can lose access to related
data. Someone asked if he could relate the model to what's happening at
Intertrust. Stephen said No. Several questions were about the lack of revocation
in the model. He doesn't know of any good expressiveness models between this and
Turing complete, but that would be nice as a model. Someone asked about a
comparison of the Lampson, Abadi, et al work on an authorization logic in the
'90s. He found it simpler to think a lattice instead of logic. There are just
functions and the trust management computation is simpler; just a fixpoint.
Someone asked why not a model theory for those logics? This is a very
operational model. It's just saying about the certs, here's what you need to
compute the TM decision. There's not much insight into what entities in the
system believe.
The next paper was "SD3: a trust management system with
certified evaluation" by Trevor Jim (AT&T Labs Research).
This paper
follows well from the one before it, as it addressed the question "What if TME
client does not have the required credentials needed for the authorization
decision?". The TME could go out and get stuff, providing distributed trust
management. The application formulates a question. If the TME can answer the
question based on the certs input, it will. Otherwise, it sends a message to
another TME that might provide needed credentials. The original TME as well as
the called TME can contact other TMEs. We can use this to build interesting
distributed systems such as PKIs. This seems to be more complicated than SPKI,
and you might worry about it becoming too complicated to be trustworthy. This
works shows how to write an interesting PKI with the system (a secure version of
DNS) and a technique for the complexity problem (certified evaluation). It makes
the trusted computing base very small (comparable to SPKI). DNS is an insecure
protocol that maps host names to addresses. For scalability, it also allows
delegation of a portion of the mapping/namespace. To find the source for a
particular name, the local namespace authority knows the root name server and
can ask it for the address. The root knows what subdomains are delegated where
and tells local name server the next candidate down the tree to ask. This
continues on down the tree to the target name. Assuming there are address and
key resource records for the root and other name servers, and delegation
records, next you need a resolver. This resolver defines a relation such that it
holds if the hostname n has address a. The system fills in the address when
queried. You can get multiple facts for multiple addresses. There are a number
of cases the resolver needs to address. First case, if there's an address record
locally for the name (fairly straightforward). Case 2, talk securely down the
name server delegation tree (you have looked up the key and address at the
current node, which is directly above the next step in the delegation tree.).
Case 3, every address record is signed by a key. Only pay attention to the ones
signed by the right party. Case 4 is the recursive case. Look recursively down
the tree. Case 5, the name is nowhere below. Go to the root. The author has a
prototype that shows a simulation. If you write a security policy, having it
small means you understand it and analyze it, and it's an advantage. The
loophole is that this is the application source code, but what about the TME? It
is complicated. With certified evaluation, the source code is what really runs.
The uncertified evaluation version takes the policy, query, and certs. There is
a signature check, an optimizer, a cache, and a core evaluator. It sends out the
query to the network, gets certs back, evaluation continues, and an answer comes
out. That system is augmented with an answer and proof between the core
evaluator and checker on the way to the answer. It produces a proof that the
answer follows the policy. It is a very detailed proof. It is long and each step
very simple. Checking is easier than computing it. It can't produce a wrong
answer, as a wrong answer would have a wrong proof. If you have a right answer
and a wrong proof, then there's a bug in evaluator. An example of a proof is in
the paper. In conclusion, the system relies on trust management + cert retrieval
+ certified evaluation, policies are understandable (at least they're quite
short) in comparison with BIND, and certified evaluation makes the TCB small.
An attendee asked what the large real version of BIND does that this 5 case
implementation doesn't do. The author said that his code is approximately what
BIND should be doing. BIND supports revocation, update, and a different message
format. If he changes his message format he could plug it in. But it doesn't
have the revocation that is necessary for scalability. Someone asked if when you
scale up, is there a disproportionate scaling in the size of the proofs and
checker; does it scale if you're supporting the warts in BIND or it
disproportionately expensive. The author believe it scales.
The next
paper, "Formal Treatment of Certificate Revocation Under Communal Access
Control" by Xuhui Ao, Naftaly Minsky, and Victoria Ungureanu (Rutgers
University), was presented by Naftaly.
The talk was mostly about the communal
access control (CAC) aspect. The certificate revocation information is in the
paper. The conventional approach to distributed AC tends to be server centric.
Each server implements its own policy often implicit in its code. This is
appropriate for client-server applications whenever the server is autonomous and
in charge, but not with an enterprise wide policy on lots of servers. You need a
concept of CAC policy that governs certain interactions. They use dealing with
patient records as a case study. Consider a large, geographically distributed
hospital, whose patient records are handled by several heterogeneous servers
(R-servers). The hospital has a single policy governing the entry, update, and
retrieval of all patient records. The server policy PR might include that
servers must be certified by a CA called admin, doctors must be specified by
both a board and the admin CA, a copy of every record is sent to a designated
audit trail server, and that a nurse may be a surrogate for a doctor. When a
certificate that authenticated a server expires, the server's ability to receive
records should be nullified immediately. However, a doctor's expiration should
be given a grace period. It is difficult to ensure all server implement the same
policy. Some relevant interactions do not involve a server (such as a doctor
appointing a nurse). Thus the policy PR is inherently communal. Law Governed
Interaction (LGI) is a message exchange mechanism that enables a community of
distributed agents to interact under an explicitly and strictly enforced
communal policy. They have a prototype of this. The membership of the community C
is open, and can be large. Nothing is assumed about the structure and behavior
of members of C, which can be quite heterogeneous. It supports a wide range of
policies and a spectrum of security concerns from inadvertent to malicious
violations. The deployment is incremental and easy. Law enforcement is entirely
decentralized for scalability, avoiding congestion and a single point of
failure. Replication cannot provide these same benefits for a stateful policy
with frequent changes in state. There is a personal controller for each agent.
Each has the same Law L and the same Interpreter I. The controller only contains
Control States relevant to agent x. Controllers send messages between each
other. If a controller is close to the client, the extra messages are
short/cheap. Laws are defined locally at each agent and without any loss of
generality. Laws deal explicitly only with the local events at individual
agents. The ruling of a law for an event e at agent x is a function of e and the
local control state. While the law is local, the effect is global as the
community is uniformly controlled by L. An agent has to find and adopt a
controller. It then can send and receive messages. Messages contain a hash of
the law L so controllers know what law the communicating thing operates under.
In that case, they are trusted to comply to same law. Certification of
controllers is also possible. Enforcement does not compel anybody, though
something may be effectively compelled if it needs services only available under
that law.
An attendee pointed out that if the law is not static, L's may be
out of sync. Yes, if you cannot stop community for a change, you have to do it
in a distributed fashion. They are pursuing it. Another question brought out
that their definition of scalability is that the policy itself is basically
local. You can increase the number of agents by a factor of n, and have no
effect at all on any one agent. To the question, does the system apply to a
transactional system with workflow ordering, Naftaly answered yes. A questioner
asked about whether another protocol was needed to check the internal
consistency of a law L. Another pointed out that in the UK, they drive on the
left, while in the US we on the right. Under this system, they are two different
communities.
The session Intrusion and Anomaly Detection II was chaired by Marc Dacier (IBM Zurich Research Center).
The paper
"Information-Theoretic Measures for Anomaly Detection" by Wenke Lee and Dong
Xiang (North Carolina State University) was presented by Wenke.
The current
state of the practice in anomaly detection is that there is a relatively high
false alarm rate, ad-hoc approaches in development and evaluation, systems are
specific to a particular environment, there are no theoretical foundations, and
it's difficult to model what's normal. The authors hypothesize that the
regularity of normal data can be used for model construction, as it provides
predictability of (future) activity. Their approach is to use
information-theoretic measures entropy and conditional entropy to model
regularity. This is follow on work to Maxion and Tan (2000). They found that to
work well in a deployment environment, the IDS training data had to be similar
to the environment. Entropy and conditional entropy can characterize the
"impurity irregularity" of the data set. The smaller (the more regular), the
better, the more likely to see them in the future. The "irregularity" of
sequential dependencies indicates the "uncertainty" of a sequence after seeing
its prefix. Again, the smaller (the more regular), the better. Relative
(conditional) entropy indicates how different is p from q. This can be used to
say how different is the regularity of the test data from that of the training
data. Again, the smaller, the better. Intrusion Detection (ID) as a
classification problem. There are normal events, intrusions, and anomaly (don't
know) events coming through the system. Maybe the anomalies should be set aside
and looked at by the human experts. The system categorizes data into the proper
category. The approach to categorizing doesn't matter. You select features, test
values, and partition based on that. The larger the information gain, the purer
the subset, the better the classification performance. The smaller the
conditional entropy, the better the performance of the classifier. For accuracy,
the more information available for analysis, the better. However, for
efficiency, that is not better. There is a trade off consideration. Assign cost
to information; the processing time. They did a case study of anomaly detection
for Unix processes. Using the Forrest et al approach, they took "short
sequences" of system calls as a normal profile, using a sliding window of length
k over the system calls. You don't see any new sequences under normal
conditions. With a a classification approach, given the first k system calls,
you try to predict the k + 1st system call. How to determine the "sequence
length" k? Looking at the conditional entropy of the training data, as you
increase k, the entropy drops ( so it's easier to predict next call). It
stabilizes around 6; that's what Forrest used. For classifiers trained and
tested on normal data, the lower the error rate, the better. The smaller the
conditional entropy, the lower the error rate. They tested their approach on
intrusion traces. When the length is short, the error rate is larger. Say the
cost is linear to the sequence length. There is an accuracy/cost a trade off. If
you don't have a measurement of accuracy before building the model, accuracy = 1
- error rate. Estimate it as 1 - conditional entropy. There is information about
the MIT data set in the paper. The "regularity" of data can guide the model
construction process. For sequential data, conditional entropy directly
influences the detection performance. With cost also considered, this may
produce the "optimal" model. Detection performance on test data can be attained
only if the regularity is similar to the training data. Future work includes
studying how to measure more complex environments, and extending the
principle/approach for misuse detection.
One question was whether this
approach could detect attacks where the program is running a normal sequence but
the results not what are desired. No, you can't catch that in this model. It
doesn't model frequencies of sequences, for example. The questioner was
interested in catching spam. Wenke said you could detect that at the mail server
by looking at frequency of requests. An attendee said that they would like that
the complexity of avoiding this technique is sufficiently high to gate realistic
intrusions. They are very far from having an attack space definition to come up
with the model.. A questioner said that they need to acknowledge that the data
they worked with was synthesized. They have not tried it against random data.
"A Fast Automaton-Based Method for Detecting Anomalous Program
Behaviors" by R Sekar (SUNY), M Bendre (SUNY), Pradeep Bolineni (Asera), and D.
Dhurjati (SUNY) was presented by Sekar.
This work also looks at program
behavior, following up on Forrest's work. A key problem is, What is a good way
to represent/learn information about system call sequences? There are issues
around compactness, accuracy, and performance. Forrest et al compared several
methods for learning system call sequences. They found memorizing sub-sequences
of length N (N-grams) to be most appropriate. Markov models provided a slight
increase in accuracy but incurred much higher overheads. This paper develops a
new method that significantly improves over the n-gram method. The false
positive rate and training time requirements are better. There are several
drawbacks of the N-gram method. The number of N-grams grows exponentially and N
must be small in practice (N=6 suggested). The implication is that it is
difficult to capture long-term correlations. The system remembers the exact set
of N-grams seen during training; there is no generalization. This necessitates
long training periods or a high rate of false alarms. Their alternative
representation is a Finite State Automaton (FSA). Even an infinite number of
sequences of unbounded length can be represented. They naturally capture program
structure such as loops, if then else, etc. However, learning FSA from sequences
is computationally intractable and no automated method for learning FSA has been
developed One difficulty is in learning FSA from strings, since strings do not
provide any information about the internal states of an FSA. Also, how do you
answer questions such as, when a state repeats, is it a new state or a return to
an old state? Their approach involves interception of system calls using ptrace
(as before) and examining the process stack to obtain program counter
information. Dynamic linking poses a problem. The same function may be loaded at
different locations during different runs. They use the program counter value
corresponding to the code calling the dynamically loaded library. As a side
benefit, ignoring library behavior makes the FSA more compact. At detection
time, a mismatch may occur in terms of either the system call or program
location. They use a leaky bucket algorithm for aggregation of mismatches. The
program counter helps resynch even after seeing behavior not seen in training.
They did experimental evaluation on ftp, http, and nfs servers. "Real" data from
live servers requires sufficient traffic load, so it was only done for httpd. It
is also difficult to control, not repeatable, and it is difficult to get code
coverage. They used simulated data for the other two programs. A script
generates requests, generating all commands accepted by the server with a
probability distribution intended to mirror real data, using randomization. The
FSA method converges much faster than n-grams. It can do with roughly an order
of magnitude less training period than n-grams. The false positive rate also
comes down at roughly the same rate. The two issues are closely related. The
differences are smaller with the live data (a factor of 5 instead of 10 in the
false positive rate). Exploits that involve deviation in program behavior can be
easily detected. An attack that involves system call arguments can be missed.
The FSA method provides a compact representation, without limiting system call
sequence lengths. In the future, they would like to incorporate system call
argument values, use source code analysis to augment runtime learning (which is
actually covered in the next talk), and learn frequency of execution
information. A questioner asked why the training period looks highly non-linear.
It is conceivable that the possible variations have been exhausted. Using
scripts could also factor in. Another question called out that the curve is not
linear in any sense. The graph is logarithmic scale. It is monotonically
decreasing.
Questioning also brought out that an example of a situation where
you would get a new N-gram for an automata that already recognizes that state is
combinations of branch types within a program, before they actually occur.
Someone commented that if the only case occurs during an intrusion, the FSA will
miss it. There was discussion about what intrusion cases that might not be
recognized. Even without the leaky bucket algorithm, with buffer overflow, there
is no single anomaly. If you're looking at the stack, stack corruption could be
detected. But it's possible to craft a string that fools the system. Maybe if
the stack looks corrupted, you don't wait any more. Related to the size of FSAs,
they are ignoring a lot of things. On how this could be used with an
interpreter, implementation would be at a different point, such as where
security decisions are made rather than system calls.
"Intrusion
Detection via Static Analysis" by David Wagner (UC Berkeley) and Drew Dean
(Xerox PARC) was presented by David.
Anomaly detection can detect unforeseen
attacks. But one of the hard problems is finding a good model. There are many
metrics of good. The problem they looked at was finding a model to reduce the
false alarm rate. A quote from Internet Week indicates it's a real issue in
practice. A sys admin may turn ID off after a few mis-timed false alarms. With a
server doing millions of transactions, even a low false alarm rate will give a
lot of false alarms. You can deduce a model from the program source code, using
static analysis. There is no training data set. An example of an attach it
catches is one that inserts malicious code. It constrains the executions of
programs to no actions inconsistent with the source code. As a tiny example, the
model might be the set S of system calls the program could ever call. The
enforcement would be that if the program's syscall trace contains anything not
in S, raise an alarm. It's not recommended, but it gives the flavor. The talk
the day before that looked at the set of dll system calls uses the same model.
Such a static model is complete; it includes all correct behavior and it gives
no false alarms. It is very hard to get complete coverage with run time testing;
it needs very large training set. You might see the control flow graph as an
intermediate form in a compiler. There is a vertex for each statement and edges
for control flow. Another model takes a finite automaton M and asks does M
accept the program's syscall trace? The model is built from the control-flow
graph of the program. It is similar to the regular expression matching problem.
The abstract stack approach provides a model that excludes impossible paths.
Using a non-deterministic pushdown automaton M, can the program's syscall trace
be parsed by M? The digraph model takes a set of all the possible pairs of
system calls S, that can happen consecutively (next to each other). All observed
syscall pairs should be in S. They built an implementation and measured the
running on real applications. Sendmail was the most complex application they
looked at. They graphed performance overhead (seconds per transaction, like mail
message received) vs. how precise the model is. You would like the model to be
at least as large as the set of all possible behaviors, but hopefully not too
much larger. They got widely varying results. The digraph model is the cheapest
in performance, but its precision not great. The abstract stack is more precise,
but the performance overhead is large. The call graph is somewhere in the
middle.
Someone asked about the detection probabilities. You give up
something in terms of attack detection. You can't detect any attacks that don't
change the behavior of the application. It did not detect race conditions and
application logic bugs. It can detect buffer overruns, format string bug
attacks, and introduction of malicious code into the process space. Someone
indicated this work is similar to work published in RAID in 1999, although that
work did not have the same rigorous reasoning. Questioning brought out that it
won't detect attacks on self-modifying code, but the model seems reasonable for
C network daemons with high privilege.
The Cryptographic Protocols, I was chaired by Dan Simon (Microsoft Research).
"Performance of Public
Key-Enabled Kerberos Authentication in Large Networks" by Alan Harbitter (PEC
Solutions) and Daniel A. Menasce (George Mason University) was presented by
Alan.
The contribution of this work is that modeling itself. They develop a
performance modeling technique that is applicable to security protocols and
their unique characteristics, and investigate the quantitative characteristics
of PK-enabled Kerberos in large, multi-realm networks. You need a ticket to the
remote KDC to work between realms. The Public Key (PK) enhancements at the
initial client request for a ticket from the local Key Distribution Center (KDC)
is called PKINIT. The local KDC and the remote one used to need to share a
secret key. The change to a public key for establishing the trust is called
PKCROSS. Adding public key encryption between the client and server eliminates
the KDC entirely, and produces the much more efficient protocol PKTAPP. In
PKINIT, the PK is in only the first message sequence. In PKCROSS, the client
(Alice) requests a key to a remote KDC. There is no shared secret key. Real time
authentication between the two KDCs establishes the secret key for Alice, and it
is put into an encrypted ticket. PKTAPP skips KDCs and goes right to the server.
The purpose of the model is to quantitatively compare the performance
characteristics of the PKCROSS and PKTAPP protocol variants. It should allow
them to easily change input parameters and conduct what-if analysis. Once they
have validated the model it can test other situations. They use a queuing
network model of authentication. The model is characterized by the queuing
discipline at each service center (first come first serve, for example), the
service time distributions, and the branching patterns (connectivity). The brute
force solution technique is to itemize each state and solve a big matrix. A more
tractable approach is to make simplifying assumptions (about homogeneity of
clients waiting in the queue, for example) and iterative approximations. They
can consider all the clients who want the same type of encryption as in the same
class. They have the topology laid out, model the input parameters and service
time in each "class", and the branching probability in each class. When a
customer leaves a queuing center, it may change classes with a probability. Each
class may have differing service times, branching probabilities and class
transition probabilities. The formulation is well studied and understood. They
developed a closed queuing network model of Kerberos multi-realm authentication
and then validated the model. They used a PKCROSS skeleton implementation that
contains all of the message parsing, encryption and communications, but excluded
error handling and protocol options. The PKTAPP skeleton implementation had the
encryption operations performed by each protocol. In the multi-realm test bed
they found that the model accurately reflected the measured results. The
predicted response time was within 3% of the test bed measurements. In comparing
PKCROSS to PKTAPP with multiple application servers, they found that with
PKCROSS, the remote KDS is the bottleneck. PKTAPP is significantly better for a
single application server. The performance is sensitive to network and server
capacities. The model is generally applicable in protocols that combine public
and secret key encryption. They demonstrated that PKCROSS outperforms PKTAPP in
situations where there are more than one server per realm. They would like to
consider CRL management performance and use the modeling methodology to
investigate other authentication protocol environments such as mobile computing.
Questions were about how PKTAPP can be a Kerberos variant, since it looks a
lot like SSH. The Kerberos message format is retained. It doesn't retain any
state on either side. Some people think the KDC is integral to Kerberos.
"A Model for Asynchronous Reactive Systems and its Application to Secure
Message Transmission" by Brigit Pfitzmann (Universitat des Saarlandes) and
Michael Waidner (IBM Research) was presented by Michael.
They are working
towards cryptographically correct and formal proofs of cryptographic protocols.
Typically you get only one or the other. In cryptography, mathematically precise
and realistic definitions are possible. The definitions and proofs are long,
difficult to grasp and error prone. They are missing probability spaces,
adversary definition, active attacks, error probabilities, and computational
restrictions. Formal methods are well defined, provide protocol languages, and
some tool support is available. But, for example, the Dolev-Yao abstraction
doesn't capture cryptographic realities and no realistic proofs are possible.
They just have functions representing encryption and decryption; there is no
looking under the covers. The goal of this work is a faithful abstraction. The
abstract primitives, protocol and goals are mapped to concrete. The system
consists of honest users and an adversary as interacting I/O automata. All the
honest users collapsed into a single machine and communicate arbitrarily. There
is a notion of time and a clock. All the adversaries are collapsed. They can
take over some communication lines, which are buffered asynchronous connections.
The main difference in this model is that the adversary is timing almost
everything. When you compare real configuration with the ideal configuration,
from the honest user's (H) point of view, the two views should be identical. The
real is then as secure as the ideal. You define what H sees and can define
probability spaces for the machines. View(H) is a random variable and it
captures all aspects of security. There are simulatability variants that are
equivalent with "guess". The general model includes collections of probabilistic
I/O automata and distributed scheduling. It is implemented by Turing machines,
for complexity reasons. In the security specific system model, the system is a
set of structures. Configurations represent a "standard" real system's intended
structure. The channel model indicates what influence the adversary has on
communications. The composition theorem approaches the proofs of security in a
somewhat modular way. The proof is in the proceedings. They can prove that a
real system is at least as secure as the theoretic system. They can replace one
system with another and get a particular level of security. The proof idea is to
take systems, combine them, replace parts, and recombine. In a real system, any
participant can send authentication and secret messages to any other
participant. It is all scheduled by the adversary. In the ideal system, there is
one single machine, TH. It has an interface to the adversary, so it can observe
things. If someone is sending something to someone else it can be observed. It
can send messages to itself. The adversary can select to determine which
messages are actually delivered to the honest users. The real system doesn't
depend on any cryptographic details. They use a cryptographic proof, then put
formal methods on top of it. The overall approach (which includes previous work)
provides a rigorous model and precise cryptographic semantics, abstract
interfaces, deterministic specifications with tolerable imperfections. The new
aspects are an asynchronous system with distributed scheduling, a composition
theorem, and reactive multi-user encryption.
The What's really different session was chaired by David Wagner (UC Berkeley).
"Cryptographic Key
Generation from Voice" by Fabian Monrose, Michael Reiter, Qi Li, and Susanne
Wetzel (Bell Labs, Lucent) was presented by Mike.
They are trying to use
voice to generate a key that can be used in applications. The adversary is
someone who steals your mobile phone. Voice is a natural UI for many people and
many devices. It is known to differentiate between users and there is a rich
literature in speaker verification. Unlike static biometrics such as
fingerprint, changing the password changes the vocalization of it, so a user can
have many keys. The speaker utters a password or phrase which is used to index
into a table. From what is selected based on a voice utterance, it can build a
key and encrypt a file on the device. They aimed for security (the key should be
resilient to offline attacks), reliability (false negative rates should be
small), and efficiency (since it would be implemented on resource constrained
devices such as third generation phones and PDAs). With encryption with a spoken
password the password entropy is low, and even lower for pronounceable
passwords. No plaintext anything is stored in the device that's speaker
dependent. The work derives from a scheme that takes into account typing
features and password to get a hardened password with more entropy than the
password itself. They broke the key into shares with a secret sharing scheme.
There is an M x 2 table (M is the bit length of the key). They use the utterance
to index into the shares of the table. To pull out M features they pull out one
of the 2 shares in each row. There is no additional security at first. Over
time, they notice patterns or features called distinguishing features. The
system can destroy the parts that the user doesn't need. The user should be able
to log in, but an imposter shouldn't. Conceptually, this can be used in any
context where traditional passwords used, such as file encryption. It can be
used to generate private/public key pairs, for VPN access, and for email
decryption. The email can only be in plaintext inside the phone. Processing the
voice password starts with capturing the signal at 800 samples/second. They
remove the silence, and break into windows 30ms long, overlapping by 10ms. They
apply a discrete Fourier transform to take it into a frequency vs. amplitude
curve. Other transforms include getting the twelve cepstrum coefficients, which
do a good job of modeling the vocal track of the user. The segmentation
technique they apply is similar to segmental vector quantization where segments
are mapped using maximum likelihood. The approaches to generate feature
descriptors include the position relative to a plane (in the paper), centroid
parity and distance from centroid. If the attacker captures the device, it must
reconstruct the key. It can start by cutting through the table lots of times.
It's hard to tell the difference between a random piece and a real share in the
table. The guessing entropy is how many vertical cuts through the table are
needed before finding the right one. The attacker has perfect knowledge of how
you speak and what you say. They applied utterances to a table. They analyzed 90
users saying a different set of 1-800 numbers. The guessing entropy grows with
the log of the number of users.
There were questions about password content
and length (in time) in the real world. They don't honestly know how secure it
is. The evidence indicates it's a direction to go. Just like any biometric, if
you have a high quality recording of the utterance you have all the key
information. But they don't know if you can do that over a distance. They view
the 2% false positive rate as too high. Someone asked, what about a dictionary
attack with someone who speaks like me? If it's the same region and same sex,
then what are the rates? Work needs to be done. Someone pointed out that Gus
Simmons demonstrated with breaking RSA based encryption for voice that there was
so much entropy one could predict what was being said.
"A Trend Analysis
of Exploitations" by Hilary Browne, William Arbaugh (UMD), John McHugh, and
William Fithen (CERT/CC) was presented by William.
Are over 90% of security
incidents due to known problems? It seems anecdotally true, but how do we
provide stronger evidence? The authors performed an analysis of past intrusions
using the CERT/CC historical database. It took about 3 years to do. They
searched the CERT summary records for key words and vulnerability numbers
(automated) then reviewed the summary record and electronic mail to ensure that
the data point was valid (manual). If the evidence didn't support the fact that
an intrusion took place, then the record was not counted. This results in an
under count. They tried to identify the number of hosts effected by each
incident, but were careful not to over count. If there was no hard evidence that
intrusion was from the specific vulnerability under study, they didn't count it.
For example, hackers make lists of which hosts have a particular vulnerability,
but these lists were not used. Intrusion reports are self-selecting. People
can't report what they don't know or understand. The human element includes
errors and boredom. Until recently (early) CERT records were not conducive to
analysis. Also, you don't always know how you were broken into if it's a new
vulnerability. The human doesn't necessarily do a thorough evaluation of the
reason for the break. Hackers may choose a new kind of break, just for fun. They
did not find what they expected to find. They expected a curve starting from the
discovery of a vulnerability, through some intrusions, with a steep increase.
Then some time after the patch is released, the patch is installed, and the
intrusion rate decreases. Bruce S wrote this up in a Cryptogram newsletter.
There was a very large positive skew to the data. The intrusion was discovered
and patched, and still there were no intrusions. The first serious intrusions
came about 3 months later, when the attack was scripted. Then it takes off. This
is proof that script kiddies exist (as if it were needed). For a vulnerability
that was patched in March '96, as of November '98, they're still seeing
intrusions. The CERT data supports the hypothesis that well over 90% of the
security incidents reported could have been prevented. Attackers have automated
(scripting) and as a result react faster than the defenders. Analysis of several
incident histograms indicated that the intrusions accumulated with a similar
shape. They then asked, can we predict the severity? If they can find a model
that fits, then they may be able to predict the severity of incidents. They used
curve fitting (no statements about any potential relationship). The accumulation
function is linear in nature. The initial analysis focused on examining the data
on a monthly basis. This demonstrated useful results but introduced a bias (not
all months are of equal length). The prediction was not useful after three
months. They're looking at a daily analysis now. The regression was done after
30 days of activity, at which point they generated a slope, then plotted the
rest of the intrusion set series. The prediction was accurate for about 3 weeks
for one vulnerability, about 70 more days for another, and for about 30 more
days for another. Then the data fell off. It appears possible to predict the
severity of intrusions, but the best prediction must be dynamic (generate a new
curve when the error grows too large). This could be used as a prioritization
mechanism for patching (though just getting people to patch to begin with would
be nice). They're working with a statistician to gain greater insights, by
grouping the data better and doing multivariate regression. They are starting
new analysis from the scripting date and continuing to collect more data. They
are going to focus on methods to tighten the defenders decision loop.
An
attendee pointed out that there is a lot of sample bias (only people who report
to CERT). The data indicates that very wide spread intrusions tend to have
vulnerabilities that persist for a long time. Maybe it's really the other way
around. The authors are trying to find statistical significance. Someone asked
if there are intrusions occurring before the discovery and announcement dates.
The hacker community is fairly boastful. Someone pointed out that the data is
consistent with statement: nobody patches much. The authors wanted to do
traditional epidemiological study. However, with this data, if you don't know
you have an intrusion, you don't figure it out. If you do figure it out, say 6
months later, all evidence of how they broke in is gone and the exact cause
can't be determined. They're looking for ideas on how to quantify the data sets. Someone asked if the time of year was important. They had looked at
seasonal aspects. It seemed to correspond to academic years. Questioning brought
out that the authors want to start looking at probes, but they're not
reported to CERT. It was pointed out that there is some anecdotal evidence that
exploits fall off because they're old news to the hackers.
5 minute presentations on developing research session was chaired by Stuart Haber (InterTrust).
Joe Weiss presented "Information security needs for EMS applications". All of these systems were designed to be vulnerable; interoperable, user configurable, open, and providing as much information as possible to anyone who needed it. Backfitting technology like encryption produces denial of service problems. They have done vulnerability studies; they're vulnerable to access by people who shouldn't have it. So far, there's no need to be afraid, because hackers want to get at a web site. There are none of those on the back end of industrial processes.
Tim Levin and Cynthia Irvine (who presented) authored "Overview of quality-of-security service". Security is often treated more statically than QoS. They're interested in devising a layered distributed system where security choices get passed down through the layers, which limit the choices.
Evie Spyropoulou presented "Managing costs and variability of security services". Related to previous talk, they offer security levels like high, medium and low, which are mapped down the layers. There is a model for determining the costs for variant security. The calculation of costs can be used for an efficient schedule. There is a range within which the values are accepted that restricts the values.
David Lie (who presented), Dan Boneh, and others authored "Xom: Architectural support for copy- and tamper- resistant software". eXecute Only Memory (XOM) uses hardware or architectural support. Programs are encrypted to have a one to one relationship to a XOM machine. The set of registers has ownership tags, indicating which program owns the values in the registers. It can guard against slicing, splicing and replay attacks. They have a paper this year at ASPLOS. It may be similar to the IBM processor paper presented in '97 at IEEE S&P.
Calvin Ko presented "SHIM: System Health and Intrusion Monitoring". The goal is to detect incorrect system behavior that can evolve into security failures. The approach is to use a hierarchy of constraints that describe the system at different levels of abstraction.
Hao Chen presented "Model-checking software for security violations". Risky code will do a seteuid(0) then later do an fopen without realizing the heightened privilege you're in. They model the risky order by a state machine and have a framework to formally check for risky states. If all privileged operations were lexically scoped this problem would be less severe.
Ludovic Me presented "A language for a comprehensive description of attacks". They have an attack skeleton with descriptions from the attacker's point of view, the defender's point of view, and how to counter the attack. There could be more information to describe an attack in the future.
Dawn Song presented "Timing analysis of keystrokes and timing attacks on SSH". Each keystroke is sent in a separate packet immediately. Using a Gaussian modeling of character pairs, they used a Hidden Markov Model to infer passwords from inter-keystroke timing. Timings leak 1 bit per character pair for random passwords. It can reduce the work factor 50 times for length 8 passwords. The paper will be at Usenix Security Symposium.
Al Valdes presented "Simulation analysis of an intrusion-tolerant system". System sensors are managed by a common interface which is also distributed. They are exploring response tradeoffs as part of a larger dependable intrusion tolerant system study.
Ajay Chander presented "A role-based trust management framework". Their distributed framework takes the clustering from roles and the decentralized distributed environment, local namespaces and local say about policies from trust management. The system includes linked local roles and role delegation.
Jabeom Gu (presenting), Sehyon Park, and Jayong Ahn authored "Secure mobile multi-casting protocol and key management". Mobile nodes are serviced from the service agent in the service area that they reside in. They have simulation results.
Jeff Jianxin Yan presented "An attack on black-box traitor-tracing schemes". Every user has a distinguished private key to decrypt the contents. Users can their keys to pirates (traitors). They use a Boneh-Franklin Scheme to generate an intelligent pirate.
Tao Ye presented "The security challenges in peer-to-peer business integration". They want to extend P2P to business processes. The decentralized system (which may be offline) needs local authentication. They will have compartmentalization of the agent contents into portfolios and access control to portfolio contents. They need host security against malicious agents and agent security against malicious hosts (detection of tampering, prevention of access), and administration of credentials across heterogeneous business domain boundaries.
Michael VanPutte presented what was probably the most popular 5 minute talk, "SimSecurity". They want to produce security people who have technical skills and think outside the box. Traditional training doesn't teach or even encourage out of the box thinking. Training is boring, but games are fun! They're hiring professional gamers to produce something like The Sims Meets Matrix. You can go into god mode and see everything and see how poorly you designed it. There will be a training mode where the security instructor builds scenarios. An interactive laboratory will have a certain amount of money to build a system or attack it. It will be a complex adaptive system with evolutionary algorithms.
Larry Koved (presenting), Aaron Kershenbaum, and Marco Pistoia authored "Permission analysis for the Java 2 platform". They do security analysis via static analysis of very large programs, focusing on the precise invocation graph and data flow. They focused on the security parts and throw away the parts that aren't relevant.
Jesus Molina presented "The co-processor as an independent auditor". He wants to provide a high assurance integrity checking mechanism for COTS OS. They use an embedded coprocessor as an independent auditor for verification of embedded digital signatures. It can be either active or passive.
Fred Cohen gave an impromptu talk. He joked "A copy of my slides is available; it's already on your computer". They have 35 full time people working on interesting security projects this summer. They can emulate a whole class B network that looks good to all the vulnerability scanners. They have IP addresses that change in real time. They want to implant tags into information so know they where it's been.
The final day
started with an invited talk by Pamela Samuelson (School of Information
Management and Systems, UC Berkeley), "Reverse Engineering: A Legal Right or
Wrong?". She drew on two papers she is writing, on the law and economics of
reverse engineering and the constitutionally based interest in reverse
engineering in US to explore why reverse engineering has been legal. The
legality of reverse engineering has been challenged in a number of contexts
recently, in the context of decompilation of software. The DMCA limits the right
to reverse engineer. The US constitution may limit application of the DMCA to
some reverse engineering. Reverse engineering has long been a proper means to
obtain a trade secret. There is no right to reverse engineer a patented
invention. But the patent supposed to teach how to make it. It is OK to take
apart or analyze a purchased machine or compound; patent law doesn't reach this.
Experimental use for science is a defense. The issue didn't arise in copyright
law because the law didn't protect industrial products and the contents of the
work was revealed on the face of the product. There is a long and bitter
controversy about whether decompilation of computer programs for purposes such
as gaining access to the interface information should be lawful. Sega v.
Accolate and other cases since 92 say that this is lawful under copyright. The
copying necessary is not illegal since it doesn't threaten competition. IEEE has
been vigilant and active on the legal rights to reverse engineer software. UCITA
(if enacted in your state) may enforce shrink-wrap license restrictions on
reverse engineering and undo intellectual property (IP) rights to reverse
engineering. It's currently only a law in MD. In Iowa, a contract that says it's
under UCITA is actually under Iowa law. It is more controversial than its
proponents anticipated. (As a humorous aside, Pamela mentioned that she likes
the Italian pronunciation of UCITA; u chee' ta). Decompiling software may
inadvertently infringe a patent, even if you're not trying to get access to a
patented algorithm. There is no fair use doctrine in patent law. It may require
legislative change to establish this right clearly. The economic espionage act
of 1996 has no reverse engineering limit and makes duplication of a trade secret
a violation. It federalizes trade secrecy law. While many states have reverse
engineering clauses in them, federal law doesn't. The DMCA exception for reverse
engineering only permits it for program interoperability.
She then discussed
DeCSS and shrinkwraps. Jon Johanssen bought a DVD. He couldn't play it because
of country codes. He reverse engineered CSS, wrote DeCSS to bypass it, and
posted the program on the net. He also hoped to help development of an open
source Linux player for DVDs. DVD-CCA v Bruner held that Jon Johanssen had
misappropriated a trade secret (CSS) by breaching the anti-reverse engineering
clause in the DVD license. Bruner was also liable because he posted DeCSS in
concert with Jon Johanssen. It is on appeal. By violating the shrinkwrap
licenses, the theory is they improperly attained the trade secret. Posting it
was publicly revealing it. It's hard to argue that CSS is now a protectable
trade secret. She also happens to think the judge's ruling is wrong. Jon
Johanssen is in Norway, was 15 years old, it's not clear that Norway's law
covers this or that he's even legally able to contract. She doesn't think the
Bruner case is a strong precedent. DeCSS is also affected by DMCA rules. One
clause says it is illegal to bypass a technical measure used by copyright owners
to protect access to protected works. Another says it is illegal to make,
distribute or offer to the public a technology primarily designed to bypass
access controls used by copyright owners. Another is an anti-tool provision for
copy-controls. There are very limited exceptions for interoperability,
encryption research and security testing. Bypassing copy control isn't illegal;
making a tool to do it is. There is an exception for law enforcement. She's
personally responsible for that. Maybe you don't have such a good legal right in
the first place. The statute complicated. There are seven exceptions to the
first clause mentioned above. Either they're meaningless totally, or there's an
implied right to make a tool. Only litigation is likely to yield the answer.
There is no express privilege in the exceptions to bypassing an access control
for making a tool. She would like the courts to interpret that there is an
implied right to make a tool.
Jon Johanssen had to bypass CSS to make DeCSS
which in the US is now a violation of DMCA. In Universal v. Reimerdes the
distribution of DeCSS violates the DMCA because it is a tool primarily designed
to bypass CSS (an effective technical measure used by movie studios to protect
access to movies). Jon Johanssen is a national hero. She was sitting with her
husband in a restaurant in Norway talking about the case. The waitress and
everyone around cheerfully joined in. The interoperability privilege didn't
apply because it's not the "sole purpose" of DeCSS to help with a Linux player.
The judge decided this. The statute only allows if it's the sole purpose. It's
not clear what trying to interoperate with DVD's that are data, not programs,
means. In Universal v Reimerdes, seven movie studios sued Eric Corley and others
for DMCA violations because they posted DeCSS on their websites. The trial judge
found that both posting and linking to DeCSS were violations. Even if Jon
Johanssen had qualified for a DMCA exception Corley didn't ebcause he wasn't
developing a linux play (he's just a journalist). The trial judge found no
constitutional infirmities in DMCA. The court of appeals heard arguments on May
1. It's not very likely it's gong to be totally reversed given composition of
that court. There are broader implications of DMCA. There is a need for comp
security testing because you can't always trust what endorsements say about
security. You need to reverse engineer in order to test security. You need to
build tools to engage in security testing. You need to be able to share results
and tools with colleagues and the research community. DMCA has an exception for
security testing but it quite narrow (you have to get permission first and there
are limits on sharing tools and results).
Pamela skipped over a retelling of
the current status of Felten's run in with SDMI and the DMCA, because most
people there were up to date on it (and Felten, Dean and Wallach were there).
They didn't violate the access control clause because SDMI technical measures
are not access controls. But they might have violated another clause if they
made or adapted tools to bypass SDMI watermarks. It doesn't qualify for the
encryption or security testing research expectations (they aren't encryption and
security is defined narrowly). Publishing or delivering a paper may "provide" a
component of a tool to the public and violate that clause. It is not clear that
SDMI watermarks were effective technical measure within the meaning of statute.
It may not be effective if it can be bypassed. It is not clear that "component"
was meant to encompass a paper discussing a copying technology. It is possible
that courts might rule Felten is not liable on statutory grounds. But even if
congress meant DMCA to reach Felten's paper, the constitution may protect him.
The 1st amendment is the first place to look and the easiest to argue in this
situation. The 1st amendment generally protects a scientist's right to publish
the results of lawful research (the SDMI hack itself was not illegal). It can
still be enjoined if there is a national security concern. In Bernstein v US,
cryptographers have a 1st amendment right to post source code on the net to
communicate scientific ideas on code. Unfortunately the decision has been
withdrawn, so it is not as impactful. The fact that someone might do something
illegal with the information is generally not enough to stop the speech (how to
make bomb). It has to be immanent, clear and present danger.
What about
other DMCA violations, such as Microsoft's hack of AOL instant messages,
bypassing access controls to view controversial Microsoft interface
specifications to avoid contractual restrictions, and bypassing Ebay's anti
spider technology? Ebay sued Bidder's Edge about their use of spiders. They were
found to be a trespass on Ebays' server. What about bypassing spyware to protect
privacy? Is there a need for a broader right to reverse engineer? Suppose DMCA
made the SDMI hack illegal or makes other security research illegal. The 1st
amendment may not suffice because much of reverse engineering is conduct.
Article I, clause 8, section 8, gives congress the power "to promote progress of
science and useful arts by securing to authors and inventors exclusive rights
for limited times in their respective writings and discoveries". Is there an
implied right to reverse engineering to promote science? The supreme court looks
at history and the constitution to discern constitutional principles of
significance and applies them. As regards the IP clause, it might find a
disclosure and dissemination principle, a public access and public domain
principle, a competition and innovation principle, and a balance and utilitarian
principle. If we stop reverse engineering, it will slow down innovation. Reverse
engineering has long been legal because of the beneficial effects on innovation
and competition. Innovation whether in art, history, science or technology,
builds o n what has gone before; incremental innovation is the norm. Other
countries don't have the same constitution so can only address it at the policy
level. Us "exporting" DMCA in the new EU directive is worse in some ways than
DMCA. There is no encryption research exception, no national security exception;
no exceptions. There is a need for the scientific technical community to help
refine DMCA-like rules to ensure adequate room to reverse engineer. Congress
didn't know what they were doing when they adopted DMCA. They heard piracy, and
said "whatever you want". They wrote a very narrow specific exception for
encryption.
There were questions involving checking of patents and
copyrights and equal protection. It is illegal to circumvent technology to see
if there's infringing material inside encryption. It's clear that there are not
enough exceptions in the law. She would argue for a general purpose exception.
Courts are pretty good at distinguishing for legitimate purposes. Even if law
enforcement has to circumvent it to get evidence, it's not clear that any
mutuality is implied. Courts could interpret some exceptions more broadly, to
get just results. They could also say go tell it to Congress. Another question
was about RIAA suggesting that the program committee might be violating DMCA by
allowing a paper to be published. In theory DMCA applies to anyone who
contributes in any way. If you make or distribute the tool, you're directly
liable. In the Reimeras case, merely linking, said the trial court, is a
violation of providing, although strictly speaking it just facilitates
providing. In an ACLU brief she coauthored, they say that contributory
liability is not what Congress intended. The program chair in the Felten case
has withdrawn from all program committees because of fear of liability. That's a
nice example of the ways in which the statute has a chilling effect on
legitimate activity. That needs to be documented. Congress can do something to
make it less bad if they are presented with a case. She believes Congress didn't
intend for it to have this effect. In contrast with the situation with the
program chair, the general chair's employer have been extremely supportive and
presented no barriers. The situation has helped to galvanize some civil
liberties organization. The ACLU wasn't sure how far they'd go in the Reimerdes
case. If you care, make some sort of contribution to the EFF. She's on the
board; they've spent tremendous resources on this. The Law school at UC Berkeley
is looking for DMCA cases to defend. Get in touch with her if you have some
situations.
Someone asked, if linking is a problem, what about searching?
There are law suits on search engines right now. None of the safe harbors in
DMCA for copyright apply. Being a lawyer sucks sometimes. You can do careful
crafting of the safe harbors, and it all can get washed away. If you're a telco,
and somebody uses your wires to transport DeCSS, you can't claim an exemption
under the safe harbor. You're distributing and providing to the public. Common
carrier doesn't apply to this sort of stuff. She thinks that courts can get
these things right. In the case of Corley, after getting enjoined, they told
everyone to send the URL information. It's hard for a judge not to say you're
not violating my injunction. Judges don't like contempt very much (even if it is
appropriate in some circumstances). [At about this time, someone I was sitting
next to handed me a slip of paper with a quote from a Mae West movie: Judge:
"Madam, are you trying to show contempt for this court?" Mae West: "No your
honor. I'm doing my very best to conceal it." ] If we're really lucky the court
will say it's contributory liability and the statute doesn't cover that. News
places linked to DeCSS sites at earlier points (even the New York Times). Maybe
there's a real first amendment issue. Someone asked if, in the Jon Johanssen
case, did the judge say what a tool is. No. It is not an issue in the trade
secrets case. Is CSS a trade secret? The judge said yes. With DeCSS, was there a
misappropriation in violation of shrinkwrap? Yes. What a tool is was not an
issue, though it is important in the context of Reimerdes. There was a question
about what's happening outside the USA. There's no question that some sort of
international cooperation in the technical community to carve out a space for
legitimate activity is important. The strategy of the content industry is to get
past the US Congress the broadest possible law, then use it as a model for an
international copyright treaty. They could make the act say that it is illegal
for the purposes of circumventing copyright. That would satisfy the treaty,
but we didn't do that. US is saying to other countries to adopt our approach
minus the exceptions. There are opportunities for countries outside the EU to
have some economic advantages. Research labs would go there. Congress would pay
attention. The opportunities are for competitive advantage, not about being a
piracy haven.
Someone said that the security community went through 2
decades of hard fights against other bad security legislation. Doing it again
sucks. How can we get ahead of this sort of thing? It's interesting to Pamela
how much mobilization there was of the scientific and technical community around
export controls and key escrow. It's difficult to mobilize those populations
around digital copyright. Barbara Simons was trying. Some places had other sets
of interests. The back story is that big entities with lots of legislative clout
decided not to exercise in DMCA. Hollywood was too dug in and it was so clearly
unconstitutional on its face. AT&T spent all its chips on the good safe
harbor rules for intermediaries. Someone suggested that archives and libraries
must remove information that might be used for this purpose. She wouldn't do
that right yet. The courts haven't interpreted that far. It's a nice way to
illustrate the overbredth. That speech was lawfully uttered years ago. You need
to stand up for your rights;
it's better than being chilled. Someone asked
her what would she write; where would she draw the line. It's a great problem.
It's a very difficult area even though she's critical of the rules. There were
rules about cable descrambler boxes. All were specific to certain domains. Part
of what's upsetting about DMCA is it's across the board. The people who drafted
it didn't think it had anything to do with law enforcement. She told them it
could provide the basis for shutting down for the NSA. She was trying to
illustrate that it's a bad law that they should try to draft it narrower. A
knowledge or intent requirement that covered use for illicit infringing purposes
and no other essential use would be good. A group of technologists could sit
down and try to draft a law that did the right thing. At the moment, there is
not a difference between a technology for fair use and one that infringes.
There's not a single instance of piracy; there's no proof of infringing movies.
If we get a ruling that it doesn't matter if no infringing copy made, that would
be bad.
The session on Cryptographic Protocols, 2 was chaired by Avishi Wool (Lumeta Corp).
"Graph-Based Authentication of Digital Streams" by Sara Miner (UC San Diego) and Jessica Staddon was presented by Sara.
This goal is an efficient source of data authentication of multicast,
continuous feed streams. A simple solution involves a signature scheme. The
sender signs each packet individually then appends the signature to each packet.
This introduces significant overhead. With the message authentication code
scheme, the sender shares a symmetric key K with all receivers. The sender
appends a MAC to each packet. There is no source authentication because all
parties share a key so any of them can generate a valid MAC. Previous work uses
a signature on one packet. They then "link" other packets to the signature
packet using some more efficient mechanism. The authentication information is
faster to compute and verify and the packet size stays small. Linking each
packet to the one before it, the entire stream content must be known in advance,
and the loss of one packet means no later packets can be authenticated. You can
break the stream and send one signature per chunk. Schemes using MACs with
require that the entire stream is known ahead of time and that there is time
synchronization. Enhancements send the key downstream in a later packet, but
still have a loose time synchronization requirement. If it reveals the MAC key
in a later packet, the revealed key cannot be sent too soon after packet that
was MACed with it. Otherwise, an adversary could tamper with data and recompute
a MAC. Most related work uses hashes and signs one packet. They differ in the
loss tolerated. One does multiple MACs per packet; the overhead per packet grows
with the number of receivers. Selecting a scheme involves considering the
overhead per packet, the authentication delay/packet buffering at the sender or
receiver, the type and amount of packet loss tolerated (random (independent)
packet loss or bursty loss), and non repudiation. Graph based representations
are an easy way to describe many existing authentication schemes. They allow the
scheme properties to be easily measured. An edge from packet I to packet J means
if packet J can be authenticated, then packet I can be. For example, the edges
represent hashes. To determine if a particular packet can be authenticated,
remove the nodes which correspond to lost packets. It can be authenticated if
and only if a path exists in the remaining graph from the packet to a signature
node. Their idea is, to combat random packet loss, they introduce randomness
into the construction. They construct a graph such that each possible "leftward"
edge exists in the graph with probability p. The expected in-degree of node I =
(n - I) p. Without any packet loss, the probability that packet I is connected
to packet 1, depends on I, and goes up as I goes up. If packets are lost
independently, then the farther away the packet, better probability that it can
be authenticated. They then want all packets to have some minimum authentication
probability. But increasing the edge probability increases overhead. The packets
closest to a signature have the lowest authentication probability. Their idea is
to find a packet I that has a good enough probability, then increase the
probability for the nodes between I and the signature. At each non signature
node left of packet I, add new rightward edges with probability p. These new
edges may be hashes or MACs with a key revealed later. They also address a
bursty loss model, where for each packet a burst of length b occurs with
probability q. The new property is packet prioritization. The user can specify
priorities on each packet. Their scheme guarantees higher authentication
probability for higher priority packets, by protecting them against longer
bursts and more bursts. For single burst tolerance, the distance between the
outgoing edge endpoints dictates the length of a burst tolerated by a node. For
higher priority nodes the outgoing edge endpoints are farthest apart. The
original scheme can actually tolerate most long bursts, just not at the
beginning. For multiple burst tolerance, they add extra edges and space them the
same way. The user chooses the desired scheme properties and their schemes
accommodate them (within limits).
One question brought out that application
profiles were not available to help applications chose a scheme. A questioner
asked about losing the first signature. It can be retransmitted multiple times
and they can allow for a request to retransmit. A question about error
correcting codes spread within the authentication material brought out that
there is a bit on this in the paper.
The final paper was "ELK, a New
Efficient Protocol for Large-Group Key Distribution" by Adrian Perrig, Dawn
Song, J. D. Tygar (UC Berkeley). Adrian presented.
Information broadcasts
such as stock quotes and radio broadcasts are gaining popularity. Information
distributors want to deliver only to paying receivers. Today they use secret key
encryption and large scale key distribution. With group key distribution, the
key server distributes keys, and receivers join/leave at any time. They require
forward and backward secrecy. The broadcast group can be re-keyed after each
join/leave. The challenge is scalable re-keying. There are backward secrecy and
forward secrecy issues with joiners and leavers. There is a lot of prior work.
Except for Keystone, it didn't look at the reliability of the key update
messages, which is crucial. The problem is that the re-key broadcast message may
get lost. A unicast recovery protocol is not scalable for very large groups. For
Reliable broadcast (neighbor recovery), scalability is an area of active
research. Keystone used forward-error correction that assumed packet loss in the
Internet is independent. It's often correlated; the probability of a
subsequent packet loss after a packet loss is higher. Unicast key recovery is
slow (>100 ms RTT). A workstation can do 10 to 6 one-way functions in 100 ms.
You can trade off computation with communication to get reliability and lower
communications cost. A key tree is used as in most previous work. Each member is
a leaf node. All members know the group keys above them in the tree. ELK
provides a broadcast free rekey on joins. They begin by assuming that a receiver
can do 2^S pseudo-random function (PRF) operations quickly while the attacker
cannot do 2^2S PRF operations. In the paper they generalize to an attacker of
arbitrary strength. On a leave, they update all the keys up the tree from
leaver. The function at the parent depends on the two child keys. The parent
picks a random key for the closest one that needs to change and propagates the
changes up. The parent key takes two contributions from two keys, using the
child keys and applying them to the previous parent's key. The contributions are
S bits each. The key server broadcasts contributions encrypted with the
unchanged keys and does key distribution with a hint. The hint key update is
compressed to S bits. They apply a one way function to the key to distribute it
and members recompute the key using the hint. For the problem of a lost rekeying
message, they add a small key hint to data packets. This trades off computation
with communication. The receiver does 2^S extra operations and the rekey
broadcast is S bits shorter. The receiver reconstructs the key faster than
contacting the key server. You can fine tune the security level by adjusting S.
It is robust to precomputation attacks. It has reliability and low communication
overhead. Want to analyze the reliability in real world settings.
Questioning indicated that they were going to bring their work to IETF.
There was a question about the rate of joins and leaves in reality. For joins,
ELK will tremendously help, since they are quite cheap. Leaving is much more
difficult. You can aggregate leaves and the key update will become smaller. If
it's a highly dynamic group with 1 million members, scalability is still a
challenge. A question wondered about its applicability to subscription based
services. Other questions centered around its utility in the face of traitors
and credit card payments. Is there a great deal of urgency to get a millisecond
vs. one hour turnaround. Aggregation is discussed in the paper for just that
reason.