Vitalii Tsybulnyk / Виталий Цыбульник

On Software Development / О софтверной разработке

About the author

    Vitalii Tsybulnyk
Vitalii Tsybulnyk is a Software Engineering Manager at Mictosoft Azure.
E-mail me Send mail

Activity

Recent comments

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008-2017

Design Guidelines for Developing Frameworks and Class Libraries

After I spent the last couple of months designing some class library for Windows Azure engineering infrastructure, I realized that design principles for frameworks and class libraries are not exactly the same as for 'off the shelf' or enterprise applications and systems. The fundamental difference is that, in the case of applications, customers don't care about your code at all, so you can use all techniques you want to make your design elegant and easy to maintain. In the case of framework/library, quite the contrary is true: your code is in some ways a user interface, which customers see, use, and care about a lot. Believe it or not, this difference significantly influences your architecture and OOD decisions in ways you may not expect.

In this post, I've collected the advice I'd give to framework/library designers. Some of these suggestions are based on these sources [1-2]; however, most of them are from my own experience. Some of this advice might contradict traditional design principles, so be careful and use them for public API/frameworks only.

Fundamentals

1. Framework designers often make the mistake of starting with the design of the object model (using various design methodologies) and then write code samples based on the resulting API. The problem is that most design methodologies (including most commonly used object-oriented design methodologies) are optimized for the maintainability of the resulting implementation, not for the usability of the resulting APIs. They are best suited for internal architecture designs—not for designs of the public API layer of a large framework. When designing a framework, you should start with producing a scenario-driven API specification. This specification can be either separate from the functional specification or can be a part of a larger specification document. In the latter case, the API specification should precede the functional one in location and time. The specification should contain a scenario section listing the top 5-10 scenarios for a given technology area and show code samples that implement these scenarios.

2. Common scenario APIs should not use many abstractions but rather should correspond to physical or well-known logical parts of the system. As noted before, standard OO design methodologies are aimed at producing designs that are optimized for maintainability of the code base. This makes sense as the maintenance cost is the largest chunk of the overall cost of developing a software product. One way of improving maintainability is through the use of abstractions. Because of that, modern design methodologies tend to produce a lot of them.

The problem is that frameworks with lot of abstractions force users to become experts in the framework architecture before starting to implement even the simplest scenarios. But most developers don’t have the desire or business justification to become experts in all of the APIs such frameworks provide. For simple scenarios, developers demand that APIs be simple enough so that they can be used without having to understand how the entire feature areas fit together. This is something that the standard design methodologies are not optimized for, and never claimed to be optimized for.

Naming Guidelines

3. The code samples should be in at least two programming languages. This is very important as sometimes code written using those languages differs significantly. It is also important that these scenarios be written using different coding styles common among users of the particular language (using language specific features). The samples should be written using language-specific casing. For example, VB.NET is case-insensitive, so samples should reflect that. Think about different languages even when you name classes, e.g. don't make mistakes like the NullReferenceException class which can be thrown by VB code, but VB uses Nothing, not null. Avoid using identifiers that conflict with keywords of widely used programming languages.

4. The simplest, but also most often missed opportunity for making frameworks self-documenting is to reserve simple and intuitive names for types that developers are expected to use (instantiate) in the most common scenarios. Framework designers often “burn” the best names for less commonly used types, with which most users do not have to be concerned. For example, a type used in mainline scenarios to submit print jobs to print queues should be named Printer, rather than PrintQueue. Even though technically the type represents a print queue and not the physical device (printer), from the scenario point of view, Printer is the ideal name as most people are interested in submitting print jobs and not in other operations related to the physical printer device (such as configuring the printer). If you need to provide another type that corresponds, for example, to the physical printer to be used in configuration scenarios, the type could be called PrinterConfiguration or PrinterManager.

Similarly, names of most commonly used types should reflect usage scenarios, not inheritance hierarchy. Most users use the leaves of an inheritance hierarchy almost exclusively, and are rarely concerned with the structure of the hierarchy. Yet, API designers often see the inheritance hierarchy as the most important criterion for type name selection. For example, naming the abstract base class File and then providing a concrete type NtfsFile works well if the expectation is that all users will understand the inheritance hierarchy before they can start using the APIs. If the users do not understand the hierarchy, the first thing they will try to use, most often unsuccessfully, is the File type. While this design works well in the object-oriented design sense (after all NtfsFile is a kind of File) it fails the usability test, because “File” is the name most developers would intuitively think to program against.

Classes vs. Interfaces

5. In general, classes are the preferred construct for exposing abstractions. The main drawback of interfaces is that they are much less flexible than classes when it comes to allowing for evolution of APIs. Once you ship an interface, the set of its members is fixed forever. The only way to evolve interface-based APIs is to add a new interface with the additional members. A class offers much more flexibility.

6. Abstract types do version much better, then allow for future extensibility, but they also burn your one and only one base type. Interfaces are appropriate when you are really defining a contract between two objects that is invariant over time. Abstract base types are better for define a common base for a family of types.

7. When a class is derived from a base class, I say that the derived class has an IS-A relationship with the base. For example, a FileStream IS-A Stream. However, when a class implements an interface, I say that the implementing class has a CAN-DO relationship with the interface. For example, a FileStream CAN-DO disposing.

Methods vs. Properties

There are two general styles of API design in terms of usage of properties and methods: method-heavy APIs, where methods have a large number of parameters and the types have fewer properties, and property-heavy APIs, where methods with a small number of parameters and more properties to control the semantics of the methods.

8. All else being equal, the property-heavy design is generally preferable.

9. Properties should look and act like fields as much as possible because library users will think of them and use them as though they were fields.

10. Use a method, rather than a property, in the following situations:

 - The operation is orders of magnitude slower than a field access would be. If you are even considering providing an asynchronous version of an operation to avoid blocking the thread, it is very likely that the operation is too expensive to be a property. In particular operations that access the network or the file system (other than once for initialization) should likely be methods, not properties.

 - The operation returns a different result each time it is called, even if the parameters don’t change. For example, the Guid.NewGuid method returns a different value each time it is called.

 - The operation has a significant and observable side effect. Notice that populating an internal cache is not generally considered an observable side effect.

 - The operation returns an array. Properties that return arrays can be very misleading. Usually it is necessary to return a copy of an internal array so that the user cannot change the internal state. This may lead to inefficient code.

Events

11. Consider using a subclass of EventArgs as the event argument, unless you are absolutely sure the event will never need to carry any data to the event handling method. If you ship an API using EventArgs directly, you will never be able to add any data to be carried with the event without breaking compatibility. If you use a subclass, even if initially completely empty, you will be able to add properties to the subclass when needed.

Enums

12. Use enums if otherwise a member would have two or more Boolean parameters. Enums are much more readable when it comes to books, documentation, source code reviews, etc. Consider a method call that looks as follows.

FileStream f = File.Open (“foo.txt”, true, false);

This call gives the reader no context with which to understand the meaning behind true and false. The call would be much more usable if it were to use enums, as follows:

FileStream f = File.Open(“foo.txt”, CasingOptions.CaseSensitive, FileMode.Open);

Some would ask why we don’t have a similar guideline for integers, doubles, etc. Should we find a way to “name” them as well? There is a big difference between numeric types and Booleans. You almost always use constants and variables to pass numeric values around, because it is good programming practice and you don’t want to have “magic numbers”. However, if you take a look at real life source code, this is almost never true of Booleans. 80% of the time a Boolean argument is passed in as a literal constant, and its intention is to turn a piece of behavior on or off. We could alternatively try to establish a coding guideline that you should never pass a literal value to a method or constructor, but I don’t think it would be practical. I certainly don’t want to define a constant for each Boolean parameter I’m passing in.

Methods with two Boolean parameters, like the one in the example above, allow developers to inadvertently switch the arguments, and the compiler and static analysis tools can't help you. Even with just one parameter, I tend to believe it's still somewhat easier to make a mistake with Booleans ... let's see, does true mean "case insensitive" or "case sensitive"?

 

Sources

1. MSDN 'Design Guidelines for Developing Class Libraries'

2. Krzysztof Cwalina, Brad Abrams 'Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries'


Posted by Vitalii Tsybulnyk on Thursday, August 26, 2010 2:07 PM
Permalink | Comments (0) | Post RSSRSS comment feed

Add comment




  Country flag

b i u quote
Loading