A couple of words about TDD
Unit-test coding supposes to be one of the most significant methodological achievements of the industry, let’s say, for about last 15 years. The Internet is full of enthusiastic exclamations [1, 5, 6]. However, there are some not so enthusiastic ones or even without any enthusiasm at all [2, 3, 13]. A humble attempt to consider pluses and minuses of TDD (test driven development) based on literature research and some practice is being made in this article.
At the beginning, it’s necessarily to clarify the terminology because there is some misunderstanding: saying TDD one means one thing and another one means another thing.
TDD is one of (many) agile engineering practices. TDD means writing tests before coding. Let me stress it: TDD is just an engineer practice, nothing more. TDD therefore can be used not only in agile but also in iterative development (ITL = iterative test last) or in waterfall, for example. It does not mean TDD has to cancel other traditional ITL practices such as requirements specification creation, architecture development, testing and so on. Sorry to say, but in this context some developers are trying to stretch the word "TDD" for the entire agile. For example, in [4] there is the following statement: "TDD handles changes in requirements better than predictive approaches". Here is one more quotation: "Switch from traditional design, implementation, testing development process to TDD..." As for me, it’s exactly the very same attempt to substitute the TDD concept with agile concept. Anyway we are not going to go deep into this kind of statements in the article.
Test First is one of TDD synonyms. We can assume that TDD = TF. Test Last is, unfortunately, one more blurry term. TL (also referred as ITL) is a design-code-test standard process. So in TL we may assume we create unit-tests after coding. So TL != TDD by definition.
We are going to review the following topics:
- TDD and code quality
- TDD and development speed
- TL vs TF
TDD and code quality
It looks like this issue is one of the most actual and probably one of the most arguable questions. Unfortunately, in all articles on the subject that I’m aware of (please, see references) there is no universal definition of what "code quality" is. For instance, in [8] they mean defect density per KLOC, whereas in [9] they believe a code has to meet certain metrics (cyclomatic complexity, DOI, RFC, etc.).
Let’s start from the defect density. We can see some pleasant uniformity here: [7, 8, 14] tell us that defect number decreases from 50% up to 2.6-4.2 (!) times. But honestly, there are still some questions left:
- There projects must really be comparable
In [8] they say about the defects number decreasing up to 2.6-4.2 times. One of the case studies compares 26 KLOC TDD project vs 149 KLOC non-TDD one. And, what a surprise, 149 KLOC project contains more bugs! Strange as it may seem...
Also it’s well known that both productivity and defects density significantly depends on a developer. The difference between professionals and beginners can be dramatically big – up to 10 times. Therefore we can compare the results of the two teams only if they are "equal" in terms of the developers’ skills. In case [8] I was not able to see it.
The situation in [7] is hardly better. But at least you can understand now what was compared with what. The authors honestly and objectively describe all doubtful sides of the experiment in chapter "Case Study: Obscurities". After reading it you can see the experiment conditions were far from ideal and it was not possible to make reliable conclusions about defect density. - Methodologies must be used to the best extent possible
One has to realize that there are methods and methods. For example, when one codes with pseudocode one can expect big code quality increase due to that fact that the algorithm was worked out in detail before coding. Code review also has a great impact on the code quality and it’s the only way to catch up to 80% of errors [11]. I could not find if the aforementioned methods had been used in [7, 8].
Logically, the correct comparison has to have the following characteristics:
- Equally strong teams.
- The same task settings.
- The same level of experience in the task given.
- Excellent operating skills regarding to the given developing technique – TDD with all the tricks possible versus ITL with all know-how (common design principles review, pseudocode programming, code review and so on).
- The code defects are studied after the coding cycle is finished. In ITL we investigate the result immediately after coding phase. In TDD we study the code which has successfully passed all unite-tests and has complete refactoring.
- Code coverage
Now let's talk about code coverage. What is qualitatively implemented TDD? Is it 100% methods and functions covered? In [2, 3] there are some well-founded objections why it could be unprofitable from both economic and architecture points of view. We will talk more about it later.
To my best understanding a qualitatively implemented TDD for a given function (method) means 100% CODE coverage inside that function (industrial standard is 80-90% [14]). Current software is not always able to create a correct report on the subject [12]. However, let’s assume that you use the right tools and accepted my suggestion: all the methods/functions you’ve decided to TDD are 100% covered. Does that coverage guarantee the code is bug-free? The answer is scary: no, it does not.
Example: we talk about 3x3 matrix. We make a move within the matrix and check if we are out of boundaries.
void checkRange( int i, int j )
{
if( i >= 3 ) throw std::exception("out of range");
if( j >= 3 ) throw std::exception("out of range");
}
void Move( int i, int j )
{
checkRange( i, j );
...
}
The example of 100% test-coverage (gtest is used):
EXPECT_THROW( Move( 0, 3 ) )
EXPECT_THROW( Move( 3, 0 ) )
And this is how we create out-of-boundaries condition:
Move( -1, 2 );
The effect could be very sad: a developer sees a green line also the bugs are left in the code. Bottom line: you should not idolize the green line and 100% code coverage and consider it as 100% panacea.
Now let’s talk about code design and TDD. As usual, alas, we have two opposite opinions:
- TDD gives us crappy code
- TDD gives us nice clean code that works
The situation is getting complicated starting with the definition of poor-quality design. It was absolutely clear what to do with defect density. Now things are getting more difficult. Traditionally, a set of metrics is used for that.
There are attempts [4, 9, 10] to evaluate TDD from the design quality point of view. In [7] the summary historic table is shown. The results are marvelous: as usual, everyone contradicts everyone.
Some articles (a miracle!) do not conflict with each other so we can summarize:
- TDD-produced code has lower cyclomatic complexity
- TDD-produced code has lower RFC (Response For a Class). RFC for a class C is the total number of methods that can be invoked when a message is sent to an object of C. Roughly speaking there is a class foo with A, B and C methods. We send a message to C. If A and B are also getting involved then RFC will be higher. If no other methods but C is involved then RFC will be lower. The lower the better.
- TDD-produced code has bigger DIT. DIT = depth of inheritance. The lower the better.
I don’t even want to talk about coupling and cohesion.
Both in case of defect density and design quality measurements, the same concerns are arisen: dependence on the human factor and the method used.
The software production methodology is especially important in case of design quality. It’s a big mistake to assume that the programming with unit-tests (either TL or TF) does not set any restrictions on the codebase of the application. It does and does it big time! In Google [6, 15] the code is being refactored in a certain way to provide the coverage needed. Usage of fakes and mocks also affects the code structure (due to so called dependency injection). If you wish to test the private methods be prepared it will also affect the code [16]. Have these advanced TDD techniques been used in the experiments or not, it’s not known. There is only one shy mention in [4].
So, the only thing that I can safely conclude is if you are an ITL-developer and you are lazy enough not to use cyclomatic checking tools go and switch to TDD.
Here is one more very important aspect. Unit-tests are quite resistant toward the code changes [2]. TDD assumes an evolutionary design model when the application grows together with the unit-tests codebase. At the same time (what a paradox!) this unit-tests codebase resists further application’s growth!
However, no one mentioned the big and small steps method. For example it is quite obvious from the beginning one should connect the business logic of an application with GUI via MVC pattern. So one decides to implement it immediately and uses so called big TDD steps.
On the other hand, as I can see it, the big TDD steps do not make big difference from ITL (now, to be precise, it means design-code-unit-test-test)
TDD and development speed
And again, as usual, everyone contradicts everyone:
- TDD does not affect the development time [18].
- TDD increases the development time up to 16% [8, 14, 18].
The process will be slower if we add something to it. The process will be faster if we remove something out of it. Obviously adding unit-tests to the coding phase will make it slower. So if the other conditions are the same ([14]: "The task was to complete a program in which the specification was given along with the necessary design and method declarations; the students completed the body of the necessary methods") then TDD has to be slower.
Conclusion: really, TDD makes the coding phase time bigger (if the other conditions are the same).
TL vs TF
Montague and Capulet... The most interesting research is shown in [7]. They consider a few hypotheses and I would like to comment all of them:
- 1T: Test-First programmers write more tests.
- 1Q: Test-First programmers produce a higher quality product
- 1P: Test-First programmers are more productive
- 2Q: Higher number of tests implies higher quality product
- 2P: Higher number of tests implies higher productivity
1T - hypothesis is accepted. It also makes perfect sense. TDD is the evolutionary design and assumes thinking through the tests writing. Obviously, by this approach there will be more tests written.
By the way, there is one thing to think about a bit. DeMarco [11] gives an idea about zero defect development. There are many components and methods in ZDD and one of them offers to find the bugs in the development process as early as possible. The sooner the bug is found the cheaper is to fix it. Fixing one word in the specification is one thing. Fixing the code that was written against the incorrect spec is completely different thing. Yeah, dear agile funs, I mean it! I really mean it. But let’s stick to the point. DeMarco offers the techniques which can highlight the bugs as early as possible. One of the methods is ... to determine the number of times you compile the code. The more the worse.
I don’t want to say anything about a company which is crazy brave enough to introduce something similar. However, I can say a few words about the profound sense of it. Assume there are two developers: John and Peter. John sits for a while. Thinks heavily. Writes some accurate code, starts the compiler. Finds forgotten semicolon, swears a bit and vu a la – the program is working. Peter just blindly starts working out a code, starts compiler, fixes bugs, start compiler again, fixes bugs ... ad infinitum. Whose code would be better? Do we have a parallel with TDD here?
1Q – the hypothesis is rejected. In the “TDD and code quality” (see above) we have considered all pros and cons and it doesn’t make any sense to repeat it again.
1P – the hypothesis is rejected. In [7] they use the following definition: "Productivity measure was obtained by normalizing the number of delivered stories by total programming effort". In the “TDD and code quality” we have stated that different developers can produce the code with the different defect density – up to 10 times! Or, put it simple, a professional can write some code in 3 days and the code will have 1 defect per 1 KLOC. Whereas a student will struggle for a month and the code will have 300 defects per 1 KLOC. In [7] they seem to think about it. However "Prequestionnaire classified experience and skill levels, before the students were randomly assigned to one of the two groups" sounds suspicious. If due to the random assignment one group gets better people, then any method of assessment means nothing. "Human resources do the deal" © Stalin.
2Q – the hypothesis is rejected. Can imagine the joy of [2, 3].
2P – the hypothesis is accepted. And it makes sense! If I’m accurate and hard-working developer and I’ve been assigned to write the tests, no doubts, I’ll do my very best and even more.
So what are the conclusions? Stop measuring productivity in terms of KLOC per day and start measuring it in the number of test units per day, ha-ha.
Conclusions
- 100% code coverage doesn’t cancel either code review or code check through a number of new-fashioned source code analyzers like prefast/sdv/pep/lint/klee/clang and etc. etc. ... Alredy got tired to mention every one of them.
- Each stick has two ends: unit-tests make you sure in code refactoring, but at the same time they resist to the code changes. Find a balance.
- The TDD positive influence on the code design can be considered very doubtful, at least. TDD itself doesn’t guarantee the right design of your application.
- TDD makes the coding phase slower.
- The question about what is cooler – TF or TL (design-code-unit-test) – is still opened.
References
- http://gamesfromwithin.com/?p=50
- http://bishop-it.ru/?p=119
- http://www.joelonsoftware.com/items/2009/01/31.html
- Evaluation of Test Driven Development. An Industrial Case Study. Hans Wasmus, Hans-Gerhard Gross.
- http://blogs.msdn.com/cellfish/default.aspx
- http://googletesting.blogspot.com/
- Effects of Test Driven Development. An Evaluation of Empirical Studies. Philip Ritzkopf.
- Evaluating the Efficacy of Test-Driven Development: Industrial Case Studies. Thirumalesh Bhat, Nachiappan Nagappan.
- Does Test-Driven Development Improve the Program Code? Alarming Results from a Comparative Case Study. Maria Siniaalto, Pekka Abrahamsson.
- Software Architecture Improvement through Test-Driven Development. David S. Janzen
- T. DeMarco. Controlling Software Projects: Management, Measurement, and Estimates
- http://blogs.msdn.com/cellfish/archive/2008/11/18/dangers-of-using-visual-studio-2008-team-system-code-coverage-tool-for-native-c.aspx
- http://www.symphonious.net/2006/02/27/test-driven-development-and-the-myth-of-better-code/
- An Initial Investigation of Test Driven Development in Industry. Boby George, Laurie Williams.
- http://misko.hevery.com/code-reviewers-guide/
- http://code.google.com/p/googletest/wiki/GoogleTestAdvancedGuide#Testing_Private_Code
- T. DeMarco. The Deadline: A Novel About Project Management.
- Preliminary Analysis of the Effects of Pair Programming and Test-Driven development on the External Code Quality. Lech Madeyski.
- Posted by: volodya 21.2.2010 at 10:51 0 comments
You're currently an anonymous user. Just browsing around? That's totally cool with us. We won't bug you until you're ready to write a comment. Otherwize you have to enter your OpenID credentials to log in. If you have not one, you can easily create it!
Example OpenIDs:
- http://openid.aol.com/yourname
- http://yourname.myopenid.com/
- https://me.yahoo.com/yourname (alternately, http://yahoo.com/ works too)
- http://claimid.com/yourname
- http://yourname.wordpress.com/
- http://yourname.blogspot.com/
- http://technorati.com/people/technorati/yourname
- http://yourname.pip.verisignlabs.com/
- http://yourname.livejournal.com/
- http://www.flickr.com/photos/yourname
WHAT'S NEW
- PEP8 validation script
- Modified PEP8 validation script with Nesting Depth additional validation.
- October 21, 2010
- PEP8 and nesting depth metric
- Company code style is one of the most essential policies to follow for any programming-related IT-organization. It helps to organize interaction between developers, especially for Agile teams, makes code more ...
- October 21, 2010
- CodeExample plugin for Trac
- The Trac plugin for code examples colouring. It supports three types of examples - a simple, a correct one and an incorrect. Further details see at
- September 29, 2010
- A couple of words about TDD
- Unit-test coding supposes to be one of the most significant methodological achievements of the industry, let’s say, for about last 15 years. The Internet is full of enthusiastic exclamations [1, ...
- February 21, 2010
- Metrics - LoC
- This is going to be a small set of articles devoted to metrics. The first one is about LoC - Line of Code. I think that the first reaction on ...
- May 11, 2009