Writing Secure Code
by Michael Howard and David C. LeBlanc
Microsoft Press 2002
ISBN 0735617228, 2nd edition
Reviewed by Fred Cohen September 15, 2003
If I were the person at Microsoft responsible for allowing books to be released I would not have approved the book by Michael Howard and David LeBlanc. But the difference between what I think such books should address and what Microsoft thinks such books should include is as telling as anything else about the book. In essence, this is a book about how Microsoft has screwed up security in their programming practices over the years and how they are trying to fix it. While I have high praise for the efforts of the authors to make these changes at Microsoft, the outcomes, approaches, and attitudes demonstrated in the book lead me to have low expectations about the future of Microsoft in producing more secure systems. To begin being a bit more specific, this book, and Microsoft's approach to addressing these issues, lacks the most fundamental sorts of models that are required for success in addressing any such problems. As a result, their effort seems to me to be doomed to failure. The ease with which they distance themselves from theory and analysis is symptomatic of the problems they have created because of their rush to produce features without a comprehensive approach to design, and it shows in the book just as it has shown in their operating environments for many years. At the same time, the book and the approach that it describes is not without merit and I think there is value in considering many of the things that they say as part of an approach to a reasonable approach to improving software quality at Microsoft - but not to writing secure code.
From the start, it is clear that this book is about Microsoft, their approach, their software, their environments, and their way of doing things. They are dominant in the operating system sales arena and more or less own the desktop for many of the world's user population, so there is some real validity in this focus. In addition, the lack of other available information on security in Microsoft environments makes this a more valuable book than it would otherwise be. On the other hand, I would like to think that the challenges of writing secure code go beyond Microsoft, but probably Microsoft does not, and why should they? The 'not invented here' syndrome has not escaped Microsoft, but this book shows strong awareness of other environments, even if it largely ignores them except when pointing out that they have some of the same problems that Microsoft has - another valid point.
The misinformation on the book starts in the 2nd sentence of the first paragraph of chapter 1 when we are told that security wasn't important before the Internet grew in importance and that the worst result would be that people could attack themselves. The first network-based global computer virus reached mainframes around the world in the late 1980s - and it did not involve the Internet. The authors ignore a great deal of work done by many researchers over a long period of time, but this is not unusual. The general level of scholarship in the computer security field has been lacking lately. In fact, relatively speaking, if the Internet were all that was out there, they would be doing a good job of finding prior work related to Microsoft. They miss lots of related work like most of the work sponsored by governments or done by academia, which is the vast majority of work done to date, but they are from the computer industry and we cannot expect them to understand such things. Factual accuracy, careful scholarship, and details consideration of the underlying issues is lacking throughout the book, but what should we expect from a typical trade book anyway? We are told that other folks are dumb or miss the point or are irrelevant, and so forth. I suppose that this is part of the Microsoft culture that starts at the top, but ignoring thoughtful understanding at the expense of getting it wrong is like cutting off your nose to spite your face.
Indeed, chapter 1 demonstrates just how poor the internal culture at Microsoft is. Such awareness activities as getting the boss to send an email and nominating an evangelist are really not about writing secure code as much as internally about convincing Microsoft to do what you want. It is valuable for salespeople no doubt. The lack of a basis for design shows up in Chapter 1 and it permeates the book. The four principles selected in the book are hardly what I would choose, but then I was not choosing. "Principle 2 - the defender can defend only against known attacks; the attacker can probe for unknown vulnerabilities" - an excellent example of why you need a design basis and how the lack of one leads you down the wrong path. "Principle 4 - The defender must play by the rules; that attacker can play dirty!" - what rules are those? My view is that this lack of a design basis, the lack of deep understanding of issues and a way to approach dealing with them, and the lack of a theory and a practice underlies the problem that Microsoft and much of the current programming community has with writing secure code. This book does nothing to solve these problems.
Chapter 2 misses by so much it's not even funny. The authors give fundamental misimpressions, for example, that secure software is equivalent to risk management. They also come out with wisdom, for example, that more eyes do not make you see better (they are telling us that just because more people see the source code doesn't mean that those people understand more about finding security problems) - and they are right. They tell us that no more Trojan Horses will be allowed in Microsoft software. It took them long enough, they used to call them "Easter Eggs", a public relation stunt to make it seem palatable, and one that worked in the large for many years. But this is a good thing and I am glad they finally decided to do this. They tell us "geeks love prizes". This is downright insulting to professional software designers! I like to be appropriately rewarded for my efforts. So do most other professionals. But the idea that trivial prizes or internal contests will significantly improve quality over time is not well supported and is likely to be counterproductive in many cases. I liked this one: If a security flaw is found, the person who wrote the code should fix it. Of course they couldn't do it right the first time, so it is unclear that they will do it right the next time, but it's a good policy nonetheless because it helps the programmer learn from their mistakes and slows the productivity of those who make more detected mistakes. It also implies accountability, which is, as far as I can tell, a new thing at Microsoft in terms of software development. I think that the tester who passed the code and the executive that hired that programmer should also have some additional work and performance hits along the way.
Chapter 3 gives us security principles to live by. Lots of good stuff here - and lots of inconsistencies as well. It tells us to do it right the first time - and that it evolves so you have to change it with time. It provides no feedback mechanism in the process either. It tells us to remember that backward compatibility will always give you grief, which is why they discourage it. That's right, Microsoft does not want backward compatability, but we all knew that a long time ago. Of course the lack of backward compatability also implies that they didn't do it right in the first place and won't do it right now, or alternatively that there is no right, just eternal change. While this is a good business model, it is a poor information protection approach. Which may be the real reason that Microsoft does so poorly in this arena. They tell us that the most valuable part of a bank is its vault - did they miss the information age somewhere? They probably never worked on bank security. These days, the computers have far more value than the vaults (except the vaults that hold the computers of course). They serve us up with code examples that are incomplete and then tell us that the trick we missed is in the missing part of the code we did not get. They give poor examples of different problems and don't follow their own requirements in their sample code. But this is to be expected - nobody could actually do all of the things they require in all of their software rules - and nobody does. For example, one of their rules is 'every failure', but they don't tell their programmers how to log a failure in the logger. Their code examples also fail to log the very conditions they indicate are important to log. Another one of their rules is to assume that the attacker knows everything you know - except the secrets of course. If you believe these folks, mixing code with data all started with Lotus 123. They give lots of advice - much of it good - some of it poor - none of it with a real basis provided.
Chapter 4 tells us to use flow charts. It is an old technology but something we have lacked in many designs of late. It also tells us that STRIDE and DREAD are the models of threats and consequences they use at Microsoft - which helps me to understand why they miss the boat so often. You need to get the book for more details because I need to cut down on the content of my review before we all run out of patience with it and the book. The book is almost 800 pages, by the way, and it does have a solid 40 pages of worthwhile content, not a bad bloat ratio for some software products. Chapter 4 is also very important to understanding where Microsoft still misses the boat in security. They didn't spend the time needed to do basic modeling and, as a result, their views and processes are incomplete, inconsistent, and lacking in a systematic approach. This is consistent with their expressed view of applying theory where appropriate - that you should never do so. That's part of why they will continue to make big stupid mistakes from time to time. We all do of course, and perfection is not the goal, but it should be something we keep in mind along the way. They ignore lots of things because of the lack of a model and it shows again and again.
Chapter 5 marks the start of part 2, and for the most part, the next several chapters talk about the same problem; separation of data from control. If the authors had realized this they could have saved a lot of time and effort and organized things a lot better. But they didn't, so we get, instead, chapters on buffer overruns, determining access control settings, and least privilege. Of course these topics should not be fully consumed under separating control from data, but they are in this book because that's what the authors encounter in their practice, and anything they don't directly encounter is apparently ignored, of course leading the the problem that if you ignore such things they are often masked as other things you don't ignore, and you never really get it. The key to understanding this part of the book is understanding that the authors are in a race with their programmers to see whether they can teach the staff to stop doing silly things before the staff can do more of those silly things. The practical concern is stopping the use of unlimited input to limited input arrays, setting access controls so that application programs cannot overwrite other programs and control files, and deciding what access rights are necessary for each program then trying to match these to the available operating system settings. The reason they need to do this is that the programmers grew up in a Microsoft environment where they always ran with full privileges and therefore never taught themselves the discipline of thinking about how to do things safely in the presence of controls. The authors have a mighty task in front of them, and I do not envy them that task.
The more senior programmers at Microsoft are apparently worse at this than their junior partners because they have been at it longer. Despite the possible advantages of fresh blood and the history of Microsoft as coming from youth, the authors have little respect for the good ideas the junior programmers bring them from academia. This appears to be a corporate cultural issue. Microsoft is aging with its creators and they have failed to recognize their own youthful influence on changing the market. As a result, they are ignoring the great value of their youthful new employees. They tell us, indirectly, that the wise old programmers have it right. As a wise old programmer myself I will have to agree that my experience and those of other experienced software designers I know is quite valuable, except that they then show us how their wise old programmers keep getting it wrong. Most of the folks I think of as wise old programmers don't make the same mistakes again and again. Of course less experienced people make more mistakes, but that doesn't mean that they bring nothing new to the table.
If you don't know why C leads to off-by-one errors that lead to storage errors that lead to programs doing bad things, these chapters are worth reading. If you like examples without all the facts to make the point, but lots of lines of code showing how to set access controls in Windows, this section of the book is for you. It is not for the same people that section 1 was for, but the audience shift should be obvious enough for most readers to ignore one part or the other appropriately. My summary note on these chapters says "Bad design + bad programmers => Bad code". I think that is telling.
Part 2 continues with chapters on cryptography. Here the authors make some good points while missing some others. One of the good points is that universities tend to teach students cryptographic algorithms and mathematics and tend to ignore the practical issues of their utility. The mystery cryptographic thing will save us - not. The authors show lots of ways that cryptosystems are defeated in practice and this is good of them to do because almost nobody tries to break modern cryptosystems by analyzing the mathematics of the RSA and coming up with a better way to factor large numbers. Instead, they break into the weak operating system that holds the keys and tell them - or tell the weak protocol to fall back to an easy-to-break system - or fool the user into a bad decision based on a cryptic prompt. Most courses in Universities focus on the mathematics of cryptography and this is a problem that should be solved. Of course most of the professors teaching these issues are basically mathematicians, so that's a big part of the issue.
The authors rightly spend a substantial amount of time on generating decent pseudo-random numbers and tell those that didn't know it yet to stop using the functions provided with languages because they produce random numbers that are cryptographically poor, even if they are good for the purposes they were originally designed for. They do a decent job here and should be applauded for it. Of course their solution s a Microsoft system call that they assert does it right. I should note that the parameters they claim to use to generate their random seeds contain some very predictable values and sets of values that, while individually may not be very predictable may be more predictable together. For example, when we add the CPU, User, and I/O time it may come to a predictable value (100%) even if each is not very predictable on their own. They do tell us to use salts for hash functions to store things like passwords. Their inability to get the job done at Microsoft shines through when we realize that the password scheme used in Microsoft products didn't use salts for their hashes and resulted in a widely published dictionary-based attack based on this weakness. It saddens me to see that even when the authors get it right the company gets it wrong.
Chapter 9 has an especially good quote - "Don't run code you don't trust" - and they also tell us lots of reasons not to trust Microsoft's code. Another good point they make is that compilers optimize out security functions. A good example is optimization removing the code that clears buffer contents after use so that residual data like passwords are no longer available to attackers. This has been well known for many years by those of us who have found ugly work-arounds. Nonetheless, the compiler designers missed this one and still do today. They also tell us that '.NET' will fix all of this - but I don't buy it for a second. The eternally increasing complexity of the security functions provided in these operating environments does not make them more secure. While the authors tell us that security features do not make systems secure, they seem to have abandoned the simplicity principle altogether in favor of lots of complex security fgeatures with complex settings that cannot be gotten right even by the authors of the book that tells us how to do it. I guess this is just how it goes. If we can't do simple things right, why do we think we can do them right by making them more complicated?
Chapter 10 tells us that all input is evil. Despite the judgement here, the thought is about right from a security standpoint. I would likely change 'is evil' to 'may come from malicious sources', but careful is not a word I would attribute to the authors of this book in their writing style. They approach the issues with reckless abandon, and that's entertaining at a minimum. The fundamental issue of chapter 10 is that data and control must be separated - where did I hear this before? They tell us how to check input syntax validity, predominantly with a view toward preventing the inclusion of meta-characters that can be interpreted so as to execute arbitrary attacker code. Even this seemingly simple issue shows us another major lapse in the book, the lack of sequential machine models and appropriate controls. You may not believe this, but the book fails to address sequential machine issues across the board and focuses entirely on combinatorics issues under stateless machine assumptions. This is not by intent, as there is no underlying model in the book. They just missed the basic notion that we are dealing with sequential machines. And of course asynchronous issues between communicating sequential machines never even hits their radar. We are told to check input validity by verifying syntax, but the use of redundant values on input to cross check validity is ignored. Input syntax is addressed, but semantics are ignored, and more particularly, we are not told how to build syntax filters that allow different syntactic elements based on previous inputs and program states. The whole field of input is covered by regular expressions on single inputs, ignoring the whole area of parsing input sequences for validity in context.
We are then told about canonical representation issues in Chapter 11, and told that the multiplicity of ways that the same content can be represented create problems for security because syntax checks are harder with multiple representations. A valid point, but of course the whole problem with regular expression checking of syntax has everything to do with the lack of a common representation and formal language approach to input validation. The use of different representations by different programs that are all part of the path of interpretation of data belie a lack of a common method for representation and set of routines for handling them. This is a fundamental lapse in design that leads to multiple implementations of the same analysis routines each with its own set of errors, and each rewritten and rerun at high cost both to design and to computer time and space during execution. The obvious solution, a uniform syntax interpretation mechanism built into the environment and used by all software in the system, is not even brought up. And of course sequential issues are again ignored.
The authors ask us "When is a line not a line?" The answer should be, when we don't separate data from control in output from programs, but that's not their answer - they go at it all piecemeal. While the authors identify the problem, they don't help us solve it, and they don't put it into context. They tell us not to make decisions based on names, but they also tell us that Microsoft uses names for most of their decisions and that there is no way out of this. Of course you could walk away from Microsoft, but let's not be silly. Unix systems have the same problems because web browsers use filenames to determine what content types to associate with output. All in all, Chapter 11 has good examples but poor solutions and fails to address the basic issue of representation and, of course, separation of control from data.
Chapters 12 and 13 are the same thing as chapter 11, repeated in the context of databases and web servers. In other words, they only give more examples of the same mistakes producing the same sorts of errors in different application environments. Useful for those who didn't get it the first 10 times, redundant for the rest of us. Finally, thankfully, chapter 14 tells us to use Unicode for representing everything. Of course this is based on internationalization issues, not security issues, and ends this section of the book. After 325 pages, I found myself wanting more for less.
In Part 3, my hopes were dashed. Yes, part 2 continues in part 3. The separation is apparently only a trick to meet an administrative requirement of maximum section sizes, or perhaps a limitation of Word based on an integer overrun. The introductory picture has a laser shooting at a spatula, about as relevant to the issue of secure coding as much of the content. Chapter 15 does a poor job of handling network issues with the exception of providing some reasonable advice on building firewall-friendly applications. Chapter 16 tells us 50 variables to set to specific values in RPC and Kerberos code (why they don't set these by default I don't know, but expecting Microsoft to do what the authors advise is expecting too much). Finally, chapter 17 tells us to protect against denial of service attacks by invoking quotas on everything and asserting that IPv6 will help to save us. It also gives us a really bad example of using software performance profiling instead of complexity analysis to find possible denial of service exploits. This is the worst example yet of ignoring academic results in favor of inferior industry methods. In particular, a junior programmer is told to ignore all that complexity theory he was taught in the University and simply test each of the routines under different inputs, find the slow routines, and speed them up. Of course in a denial of service scenario, if there is a high complexity function that is fast in almost all cases, a good attacker will find the worst case input sequences and exploit them while the testing scheme will almost certainly miss these cases unless they do complexity analysis. This section doesn't even talk to the basics of deadlock, an area where theory has existed for a long time and is very useful in mitigating attacks. In chapter 18 we are told how to write secure .NET code, but of course it ignores all of the previous lessons and tells us to use security features that extend trust from domain to domain without a good basis. Ah well, what should I have expected from the last chapter in this section. They would have done better to cut this whole section out.
Finally, in section 4 - on page 567 - under 'Special Topics" - they cover protection testing, which they called critical and fundamental in the introduction to the book. And finally, they start to begin to put a model on the security issue that can be applied in a rational way to systematically approach the challenges we face. Unfortunately, their start falls short by ignoring things like coverage, not carrying the model through to their long checklists, and giving us examples of hiring a tester who clinched the job based on the statement that he could "break anything that opened a socket". If I were interviewing him, I would have given him an example of something that opens a socket and told him that he could only get the job if he could 'break' it (whatever that means). I would give him the next year to succeed and after a year of effort he would have learned not to make such bold statements in the context of my workplace. This is not Microsoft's way and of course the statement may be true in their environment. At Microsoft, brashness wins, flowing from the top. Yet, despite all of my complaints this is the best chapter in the book by a long way. So save yourself the full cost, go to the bookstore, and read chapter 19, ignoring the code examples - which are useful but not really all that good, and stopping after the first few pages so as to avoid the checklists.
In chapter 20, on page 618, the first mention of complexity issues comes up, but fear not, they won't address it in depth. They simply tell us that things get complicated in large applications. They continue their examples based on inadequate information supplied to the reader and assumptions based on the version of C being used. They continue to tell us how to get around poor design by convoluted security implementations, and continue the 'My 3 Bugs' approach to teaching us about security.
Chapter 21 does no better. They tell us about secure software installation by telling us, in essence, that installation software missed that security thing. Then comes the really hokey advice. Like that systems administrators should be able to alter application programs. Really? Since when should users who are the "administrators" on their computers be able to alter the binary code of an executable from a vendor? What they really mean is that the security mechanisms are inadequate to do what we want to do, so we have to make bad approximations and poor assumptions in order to get things to work. And why should we put keys to codes in the system-wide registry file instead of a file that is protected from read by others? It's all about poor assumptions, and there are lots of them here to complain about. The lack of stepping back and looking at the real issues underlies the poor advice in this chapter. But fear not, there is worse to come. Chapter 22 is about legal issues in privacy, but it doesn't even do that well. All it really does is pile more mindless data on the reader without the context to apply it well. Chapter 23 tells us about 'good practices', which is better than trying to tell us that these are 'best practices', but still not adequately downgraded to reflect the reality. We are given a paragraph on what not to tell attackers, and it deserves a chapter at least. We are told that ANY application running on the desktop can be taken over by ANY OTHER application running on the desktop! This is a huge hole in Windows software, but the advice is simply not to use the desktop - impractical for Windows. They tell us not to use banner strings on servers, but then tell us that this is hopeless because all of their software uses these banners to differentiate between how to deliver services. They are right, we should have and use standards that allow representation to be product independent, but of course Microsoft is the company that brings you proprietary versions of everything to keep you from buying other vendor products. They tell us not to rely on users to make good decisions, and rightly so, and they provide us with lots of examples of their products doing this job poorly. It is brave and I like it. And yet, all in all, this was a poor chapter that again demonstrated the lack of organization or organizing principles in this book.
Finally, chapter 24, on documentation and error messages, is a strong closer for this book. This is a chapter that should be in security books and this book did a good job of trying to address the issues. The attempt largely failed, but it should be applauded nonetheless and should be read by anyone writing a book on security as a starting point for their chapter on this subject. But the problems in this chapter start early. On the second page they tell us not to use security through obscurity, after having told us in the prior chapter not to tell attackers anything. A few pages later, they tell us to not reveal anything sensitive in error messages, then give us what they think is a good example of telling the attacker that the password they just tried was wrong only because some of the characters were in the wrong case. Of course this eliminates the value of using case sensitive passwords in the first place, telling the attacker a great deal of useful information by reducing the search space by several orders of magnitude, but they seem to have missed that. They have some good points to make and make them, but it would be better not to hold up big mistakes as good examples. The appendices give us checklists including functions that are dangerous from a security standpoint, but add little to the book or its value. The book ends unceremoniously.
My roll-up of this book is less than complimentary, and yet I like the authors and think they are smart and thoughtful people who work hard at doing their jobs well. I think that the lack of time and attention to the underlying issues, the lack of organization and models, and the inconsistencies and poor advice are all related to spending too little time thinking through the issues and organization of the book. This is reflective of the same corporate culture that led to the problems with security at Microsoft and in other software vendors. It links to the lack or respect for academic research and results we see in industry, and the conceit that was once associated with academia spreading to the computer industry. While I do not think that academia gets it right all of the time, it is clear that industry fails magnificently as well. Another facet of the problem is the lack of thoughtfulness and modeling in this book and the computer industry. We see thoughtfulness in other fields of inquiry and by those who are thoughtful in the security arena, but it is clear that thinking about issues and working the problems at a deep level is not going to come from Microsoft. While this book has interesting items and details that dispel many of the common misimpressions that programmers have about how exploitable errors in code are, it fails to help them understand the real issues in building secure programs. As such it is misnamed. A better name might be something like: "How poor quality programmers at Microsoft have produced hundreds of instances of the same 10 big mistakes in their code, and how they can do their jobs a little bit better".