Leonard Richardson, Mike Amundsen, Sam Ruby
Contents
Ch 1. Surfing the Web
Ch 2. A Simple API
Ch 3. Resources and Representations
REST is not a protocol, a file format, or a development framework. It’s a set of design constraints: statelessness, hypermedia as the engine of application state, and so on. Collectively, we call these the Fielding constraints, because they were first identified in Roy T. Fielding’s 2000 dissertation on software architecture, which gathered them together under the name “REST.”
In this chapter, I’ll finish my explanation of the Fielding constraints in terms of the World Wide Web. My “bible,” as it were, will not be the Fielding dissertation. Instead, I’ll be drawing from the W3C’s guide to the Web, The Architecture of the World Wide Web, Volume One (there is no Volume Two). The Fielding dissertation explains the decisions behind the design of the Web, but Architecture explains the three technologies that came out of those decisions: URL, HTTP, and HTML.
A Resource Can Be Anything
A Representation Describes Resource State
A pomegranate can be an HTTP resource, but you can’t transmit a pomegranate over the Internet. A row in a database can be an HTTP resource; in fact, it can be an information resource, because you can literally send it over the Internet. But what would the client do with a chunk of binary data, ripped from an unknown database without any context?
When a client issues a GET request for a resource, the server should serve a document that captures the resource in a useful way. That’s a representation—a machine-readable explanation of the current state of a resource. The size and ripeness of the pomegranate, the data contained in the database fields.
The server might describe a database row as an XML document, a JSON object, a set of comma-separated values, or as the SQL INSERT statement used to create it. These are all legitimate representations; it depends on what the client asks for.
One application might represent a pomegranate as an item for sale, using a custom XML vocabulary. Another might represent it with a binary image taken by a Pomegranate-Cam. It depends on the application. A representation can be any machine-readable document containing any information about a resource.
Representations Are Transferred Back and Forth
We think of representations as something the server sends to the client. That’s because when we surf the Web, most of our requests are GET requests. We’re asking for representations. But in a POST, PUT, or PATCH request, the client sends a representation to the server. The server’s job is then to change the resource state so it reflects the incoming representation.
The server sends a representation describing the state of a resource. The client sends a representation describing the state it would like the resource to have. That’s representational state transfer.
The Protocol Semantics of HTTP
Overloaded POST
Those two strings are not much to work from. Until recently, application semantics were so poorly understood that I recommended not using overloaded POST at all. But if you follow the advice I give in Chapter 8, you can use a profile to reliably communicate application semantics to your clients. It won’t be as reliable as the protocol semantics—every HTTP client ever made knows what GET means—but you’ll be able to do it.
Since an overloaded POST request can do anything at all, the POST method is neither safe nor idempotent. One particular overloaded POST request may turn out to be safe, but as far as HTTP is concerned, POST is unsafe.
Which Methods Should You Use?
If you want an API entirely described by HTML documents, then your protocol semantics are limited to GET and POST. If you want to speak to filesystem GUI applications like Microsoft’s Web Folders, you’ll be using HTTP plus the WebDAV extensions. If you need to talk to a wide variety of HTTP caches and proxies, you should stay away from PATCH and other methods not defined in RFC 2616.
Ch 4. Hypermedia
Look closer, and you’ll see a question that hasn’t been answered: how does the client know which requests it can make? There are infinitely many URLs. How does a client know which URLs have representations behind them and which ones will give a 404 error? Should the client send an entity-body with its POST request? If so, what should the entity-body look like? HTTP defines a set of protocol semantics, but which subset of those semantics does this web server support on this URL right now?
The missing piece of the puzzle is hypermedia. Hypermedia connects resources to each other, and describes their capabilities in machine-readable ways. Properly used, hypermedia can solve—or at least mitigate—the usability and stability problems found in today’s web APIs.
Like REST, hypermedia isn’t a single technology described by a standards document somewhere. Hypermedia is a strategy, implemented in different ways by dozens of technologies. I’ll cover several hypermedia standards in the next three chapters, and a whole lot more in Chapter 10. It’s up to you to choose the technologies that fit your business requirements.
The hypermedia strategy always has the same goal. Hypermedia is a way for the server to tell the client what HTTP requests the client might want to make in the future. It’s a menu, provided by the server, from which the client is free to choose. The server knows what might happen, but the client decides what actually happens.
In this chapter, I want to dispel the mystery of hypermedia, so you can create APIs that have some of the flexibility of the Web.
HTML as a Hypermedia Format
To sum up, the familiar HTML controls allow the server to describe four kinds of HTTP requests.
The <a> tag describes a GET request for one specific URL, which is made only if the user triggers the control.
The <img> tag describes a GET request for one specific URL, which happens automatically, in the background.
The <form> tag with method="POST" describes a POST request to one specific URL, with a custom entity-body constructed by the client. The request is only made if the user triggers the control.
The <form> tag with method="GET" describes a GET request to a custom URL constructed by the client. The request is only made if the user triggers the control.
HTML also defines some more exotic hypermedia controls, and other data formats may define controls that are stranger still. All of them fall under the formal definition of hypermedia given in the Fielding dissertation:
Hypermedia is defined by the presence of application control information embedded within, or as a layer above, the presentation of information.
The World Wide Web is full of HTML documents, and the documents are full of things people like to read—prices, statistics, personal messages, prose, and poetry. But all of those things fall under presentation of information. In terms of presentation of information, the Web isn’t much different from a printed book.
It’s the application control information that distinguishes an HTML document from a book. I’m talking about the hypermedia controls that people interact with all the time, but rarely examine closely. The <img> tags that tell the browser to embed certain images, the <a> tags that transport the end user to another part of the Web, and the <script> tags that supply JavaScript for the browser to execute.
An HTML document that contains a poem will probably also feature a link to “Other poems by this author,” or a form that lets the reader “Rate this poem.” This is application control information that couldn’t show up in a printed book of poetry. The presence of application control information can certainly reduce the emotional impact of a poem, but an HTML document containing only the text of a poem is not a full participant in the Web. It’s just simulating a printed book.
URI Templates
RFC 6570, URI Template
URI Versus URL
Most web APIs deal exclusively with URLs, so for most of this book, the distinction doesn’t matter. But when it’s important (as it will be in Chapter 12), it’s really important.
A URL is a short string used to identify a resource. A URI is also a short string used to identify a resource. Every URL is a URI. They’re described in the same standard: RFC 3986.
What’s the difference? As far as this book is concerned, the difference is this: there’s no guarantee that a URI has a representation. A URI is nothing but an identifier. A URL is an identifier that can be dereferenced. That is, a computer can somehow take a URL and get a representation of the underlying resource.
Here’s a URI that’s not a URL: urn:isbn:9781449358063. It designates a resource: the print edition of this book. Not any particular copy of this book, but the abstract concept of an entire edition. (Remember that a resource can be anything at all.) This URI is not a URL because… what’s the protocol? How would a computer get a representation? You can’t do it.
Without a URL, you can’t get a representation. Without representations, there can be no representational state transfer. A resource that’s not identified by a URL cannot fulfill many of the Fielding constraints. It can’t fulfill the self-descriptive message constraint, because it can’t send any messages. A representation can link to a URI that’s not a URL (<a href="urn:isbn:9781449358063">), but that won’t fulfill the hypermedia constraint, because a client can’t follow the link.
The Link Header
The Link header has approximately the same functionality as an HTML <a> tag. I recommend you use real hypermedia formats whenever possible, but when that’s not an option, the Link header can be very useful.
What Hypermedia Is For
We need to take a step back and see what hypermedia is for.
Hypermedia controls have three jobs:
- They tell the client how to construct an HTTP request: what HTTP method to use, what URL to use, what HTTP headers and/or entity-body to send.
- They make promises about the HTTP response, suggesting the status code, the HTTP headers, and/or the data the server is likely to send in response to a request.
- They suggest how the client should integrate the response into its workflow.
Beware of Fake Hypermedia!
There are a lot of existing APIs that were designed by people who understood the benefits of hypermedia, but that don’t technically contain any hypermedia. Imagine a bookstore API that serves a JSON representation like this:
HTTP/1.1 200 OK Content-Type: application/json { "title": "Example: A Novel", "description": "http://www.example.com/" }
This is a representation of a book. The description field happens to look like a URL: http://www.example.com/. But is this a link? Is description supposed to link to a resource that gives the description? Or is it supposed to be a textual description, and some smart aleck typed in some text that happens to be a valid URL?
Formally speaking, "http://www.example.com/" is a string. The application/json media type doesn’t define any hypermedia controls, so even if some part of a representation really looks like a hypermedia link, it’s not! It’s just a string!
If you’re trying to consume an API like this, you won’t get very far dogmatically denying the existence of links. Instead, you’ll read some human-readable documentation written by the API provider. That documentation will explain the conventions the provider used to embed hypermedia links in a format (JSON) that doesn’t support hypermedia. Then you’ll know how to distinguish between links and strings, and you’ll be able to write a client that can detect and follow the hypermedia links.
But your client will only work for that specific API. The documentation you read is the documentation for a one-off fiat standard. The next API you use will have a different set of conventions for embedding hypermedia links in JSON, and you’ll have to do the work all over again.
That’s why API designers shouldn’t design APIs that serve plain JSON. You should use a media type that has real support for hypermedia. Your users will thank you. They’ll be able to use preexisting libraries written against the media type, rather than writing new ones specifically for your API.
JSON has been the most popular representation format for APIs for quite a while, but as recently as a couple years ago, there were no JSON-based hypermedia formats. As you’ll see in the next few chapters, that has changed. Don’t worry that you’ll have to give up JSON to gain real hypermedia.
The Semantic Challenge: How Are We Doing?
The application described by HTML is the World Wide Web, a very flexible application that’s used for all sorts of things.
A hypermedia format doesn’t have to be generic like HTML. It can be defined in enough detail to convey the application semantics of a wiki or a store. In the next chapter, I’ll talk about hypermedia formats that are designed to represent one specific type of problem. Outside that problem space, they’re practically useless. But within their limits, they meet the semantic challenge very well.
Chapter 5. Domain-Specific Designs
Maze+XML: A Domain-Specific Design
The media type of a Maze+XML document is application/vnd.amundsen.maze+xml. If you ever make an HTTP request and see that string used as the Content-Type of the response, you’ll know that you need the Maze+XML specification to fully understand the entity-body. This is how a domain-specific design meets the semantic challenge: by defining a document format that represents the problem (such as the layout of a maze), and by registering a media type for that format, so that a client knows right away when it’s encountered an instance of the problem.
In general, I don’t recommend creating new domain-specific media types. It’s usually less work to add application semantics to a generic hypermedia format—a technique I’ll cover in the next two chapters. If you set out to do a domain-specific design, you’ll probably end up with a fiat standard that doesn’t take advantage of the work done by your predecessors. You probably won’t have the flexibility problems that plague most of today’s APIs, but you’ll have done more work for no real benefit.
But a domain-specific design is the average developer’s first instinct when designing an API. What could be more natural than simply solving the problem at hand? That’s why I’m covering domain-specific designs first. It’s easy to show how a custom hypermedia format can bridge the semantic gap.
How Maze+XML Works
Each cell in a Maze+XML maze is an HTTP resource with its own URL. If you send a GET request to the first cell in this maze, you’ll get a representation that looks like this:
<maze version="1.0"> <cell href="/cells/M" rel="current"> <title>The Entrance Hallway</title> <link rel="east" href="/cells/N"/> <link rel="west" href="/cells/L"/> </cell> </maze>
A link relation is a magical string associated with a hypermedia control like Maze+XML’s <link> tag. It explains the change in application state (for safe requests) or resource state (for unsafe requests) that will happen if the client triggers the control. Link relations are formally defined in RFC 5988, but the idea has been around for a long time, and nearly every hypermedia format supports them.
RFC 5988 defines two kinds of link relations: registered relation types and extension relation types. Registered link relations look like the ones you see in the IANA registry: short strings like east and previous. To avoid conflicts, these short strings need to be registered somewhere—not necessarily with the IANA, but in some kind of standard such as the definition of a media type.
Chapter 9 includes a guide explaining when it’s OK to use the shorter names of registered relations. Here’s a summary:
- You can use extension relations wherever you want.
- You can use IANA-registered link relations whenever you want.
- If a document’s media type defines some registered relations, you can use them within the document.
- If a document includes a profile that defines some link relations (see Chapter 8), you can treat them as registered relations within that document.
- Don’t give your link relations names that conflict with the names in the IANA registry.
Follow a Link to Change Application State
The Collection of Mazes
Is Maze+XML an API?
But this book focuses on web APIs, which is to say, web-scale APIs (i.e., APIs where any member of the public can use a client, or write a client, or, in some cases, write a server). When you allow someone outside your organization to make API calls, you make that person a silent partner in the implementation of your server. It becomes very difficult to change anything on the server side without hurting this unknown customer.
This is why public APIs change so rarely. You can’t change an API based on API calls without causing huge pain among your users, any more than you can change the API of a local code library without causing pain. At web scale, API call designs become paralyzed.
Designs based on hypermedia have more flexibility. Every time the client makes an HTTP request, the server sends a response explaining which HTTP requests make the most sense as a next step. If the server-side options change, that document changes along with it. This doesn’t solve all of our API problems—the semantic gap is a huge problem!—but it solves the one we know how to solve.
Client #1: The Game
The obvious use for the Maze+XML API is a game to be played by a human being. Here’s a single-page app that grabs a collection of mazes and lets you choose one to play. Once you enter a maze, you’re presented with a rat’s-eye view and you navigate the maze by typing in directions. Once you find the exit, you get a score—the number of “turns” you spent in the maze.
A Maze+XML Server
Client #2: The Mapmaker
Client #3: The Boaster
Clients Do the Job They Want to Do
Extending a Standard
The Mapmaker’s Flaw
Maze as Metaphor
Meeting the Semantic Challenge
For the designer of a domain-specific API, bridging the semantic gap is a two-step process:
- Write down your application semantics in a human-readable specification (like the Maze+XML standard).
- Register one or more IANA media types for your design, (like vnd.amundsen.application/maze+xml. In the registration, associate the media types with the human-readable document you wrote. In Chapter 9, I’ll discuss the naming and registration process for media types.
Your client developers can reverse the process to bridge the semantic gap in the other direction:
- Look up an unknown media type in the IANA registry.
- Read the human-readable specification to learn how to deal with documents of the unknown media type.
There’s no magic shortcut. To get working client code, your users will have to read your human-readable document and do some work. We can’t get rid of the semantic gap completely, because computers aren’t as smart as humans.
Where Are the Domain-Specific Designs?
When you need to publish an API, the first thing to do is to try to find an existing domain-specific design. There’s no point in duplicating someone else’s work.
If You Can’t Find a Domain-Specific Design, Don’t Make One
Kinds of API Clients
Chapter 6. The Collection Pattern
Chapter 7. Pure-Hypermedia Designs
In this chapter, I’ll discuss APIs that use a generic hypermedia language as their representation format. I’ll talk about a number of newfangled representation formats, but the focus of my explanation will be an old format that you’re already familiar with: HTML.
Why HTML?
We think of HTML in the context of the World Wide Web: a network of documents intended to be read by human beings. That popularity makes it the obvious choice for any part of an API that serves documents intended for human consumption. Even if the rest of your API serves XML- or JSON-based representations, you can use HTML for the parts that will be rendered to a human user. Such is HTML’s popularity that every modern operating system ships with a tool for debugging HTML-based web APIs: a web browser.
HTML has distinct advantages even for an API designed to be consumed entirely by machines. HTML imposes more structure on a document than XML or JSON does, but not so much structure as to solve only one specific problem, the way Maze+XML does. HTML sits somewhere in the middle, like Collection+JSON.
Unlike bare XML or JSON, HTML comes packaged with a standardized set of hypermedia controls. But HTML’s controls are very general, and not bound to a specific problem space. Collection+JSON defines a special hypermedia control for search queries; HTML defines a hypermedia control (the <form> tag) that can be used for any purpose at all.
Finally, there’s the popularity argument. HTML is by far the world’s most popular hypermedia format. There are lots of tools for parsing and generating HTML, and most developers know how to read an HTML document. Because HTML is so popular, it’s the base standard for two enormous, ongoing efforts to bridge the semantic gap: microformats and microdata, which I’ll cover later in this chapter.
HTML’s Capabilities
Hypermedia Controls
link, a, img, script, form (+GET), form (+POST)
Plug-in Application Semantics
HTML defines application semantics for a very general application: human-readable documents. The HTML standard defines tags for paragraphs, headings, sections, lists, and other structural elements found in news articles and books.
HTML doesn’t define tags for mazes or for cells in mazes. That’s not its application. But HTML is different from Maze+XML or Collection+JSON in that it’s easy to use HTML outside of its application. HTML 4 defines three generic attributes that we can use to add application-level semantics not defined in the HTML standard. (HTML 5 defines a few more, which I’ll cover later.)
The Alternative to Hypermedia Is Media
HTML’s Limits
The Hypertext Application Language
HTML is old, crufty, and designed for human-readable documents. Several new hypermedia formats have emerged in reaction to HTML, formats designed specifically for use in web APIs. The Hypertext Application Language (HAL) is a new format that takes the fundamental concept of HTML—the hyperlink—and ruthlessly prunes away everything else. I think it prunes too much, but it’s a good example of a general hypermedia language that doesn’t have HTML’s historical baggage. Let’s see how it works.
Siren
The Semantic Challenge: How Are We Doing?
Let’s recap the situation as it stands. We have a client-server Internet protocol, HTTP, which assigns very general meanings to different kinds of requests: GET, POST, PUT, and so on.
We have the idea of hypermedia, which allows the server to tell the client which HTTP requests it might want to make next. This frees the client from having to know the shape of the API ahead of time.
Chapter 8. Profiles
- Is there a domain-specific standard for your problem? If so, use it. Document any application-specific extensions (Chapter 5).
- Does your problem fit the collection pattern? If so, adopt one of the collection standards. Define an application-specific vocabulary and document it (Chapter 6).
- If neither of those is true, choose a general hypermedia format. Break down your application into its state transitions. Document those state transitions (Chapter 7).
- At this point, you have your protocol semantics nailed down. The application semantics are all that remain. Are there existing microdata items or microformats that cover your problem domain? If so, use them. Otherwise, define an application-specific vocabulary and document it (Chapter 7).