In recent years we have witnessed an explosion of Internet-connected applications. Whether it is a new mobile app to find your soulmate, the latest wearable to monitor your vitals, or an industrial solution to detect corrosion, our life is becoming packed with connected systems.
How is the Internet changing because of this shift? This blog provides an overview of how Internet traffic is evolving as Application Programming Interfaces (APIs) have taken the centre stage among the communication technologies. With help from the Cloudflare Radar team, we have harnessed the data from our global network to provide this snapshot of global APIs in 2021.
The huge growth in API traffic comes at a time when Cloudflare has been introducing new technologies that protect applications from nascent threats and vulnerabilities. The release of API Shield with API Discovery, Schema Validation, mTLS and API Abuse Detection has provided customers with a set of tools designed to protect their applications and data based on how APIs work and their challenges.
We are also witnessing increased adoption of new protocols. Among encryption protocols, for example, TLS v1.3 has become the most used protocol for APIs on Cloudflare while, for transport protocols, we saw an uptake of QUIC and gRPC (Cloudflare support announced in 2018 and 2020 respectively).
In the following sections we will quantify the growth of APIs and identify key industries affected by this shift. We will also look at the data to better understand the source and type of traffic we see on our network including how much malicious traffic our security systems block.
Why is API use exploding?
By working closely with our customers and observing the broader trends and data across our network in application security, we have identified three main trends behind API adoption: how applications are built is changing, API-first businesses are thriving, and finally machine-to-machine and human-to-machine communication is evolving.
APIs are also enabling companies to innovate their business models. Across many industries there is a trend of modularizing complex processes by integrating self-contained workflows and operations. The product has become the service delivered via APIs, allowing companies to scale and monetize their new capabilities. Financial Services is a prime example where a monolithic industry with vertically integrated service providers is giving way to a more fragmented landscape. The new Open Banking standard (PSD2) is an example of how small companies can provide modular financial services that can be easily integrated into larger applications. Companies like TrueLayer have productized APIs, allowing e-commerce organizations to onboard new sellers to a marketplace within seconds or to deliver more efficient payment options for their customers. A similar shift is happening in the logistics industry as well, where Shippo allows the same e-commerce companies to integrate with services to initiate deliveries, print labels, track goods and streamline the returns process. And of course, everything is powered by APIs.
Finally, the increase of connected devices such as wearables, sensors and robots are driving more APIs, but another aspect of this is the way manual and repetitive tasks are being automated. Infrastructure-as-Code is an example of relying on APIs to replace manual processes that have been used to manage Internet Infrastructure in the past. Cloudflare is itself a product of this trend as our solutions allow customers to use services like Terraform to configure how their infrastructure should work with our products.
The data presented in the following paragraphs is based on the total traffic proxied by Cloudflare and traffic is classified according to the Content-Type header generated in the response phase. Only requests returning a 200 response were included in the analysis except for the analysis in the ‘Security’ section where other error codes were included. Traffic generated by identified bots is not included.
When looking at trends, we compare data from the first week of February 2021 to the first week of December 2021. We chose these dates to compare how traffic changed over the year but excluding January which is affected by the holiday season.
Specifically, API traffic is labelled based on responses with types equal
text/xml, while Web accounts for
text/plain; Binary are
application/octet-stream; Media includes all image types, video and audio.
Other catches everything that doesn’t clearly fall into the labels above, which includes
unknown. Part of this traffic might be API and the categorisation might be missing due to the client or server not adding a Content-Type header.
API use in 2021
We begin by examining the current state of API traffic at our global network and the types of content served. During the first week of December 2021, API calls represented 54% of total requests, up from 52% during the first week of February 2021.
When looking at individual data types, API was by far the fastest growing data type (+21%) while Web only grew by 10%. Media (such as images and videos) grew just shy of 15% while binary was the only traffic that in aggregate experienced a reduction of 6%.
In summary, APIs have been one of the drivers of the traffic growth experienced by the Cloudflare network in 2021. APIs account for more than half of the total traffic generated by end users and connected devices, and they’re growing twice as fast as traditional web traffic.
New industries are contributing to this increase
We analysed where this growth comes from in terms of industry and application types. When looking at the total volume of API traffic, unsurprisingly the general Internet and Software industry accounts for almost 40% of total API traffic in 2021. The second-largest industry in terms of size is Cryptocurrency (7% of API traffic) followed by Banking and Retail (6% and 5% of API traffic respectively).
The following chart orders industries according to their API traffic growth. Banking, Retail and Financial Services have experienced the largest year-on-year growth with 70%, 51% and 50% increases since February 2021, respectively.
The growth of Banking and Financial Services traffic is aligned with the trends we have observed anecdotally in the sector. The industry has seen the entrance of a number of new platforms that aggregate accounts from different providers, streamline transactions, or allow investing directly from apps, all of which rely heavily on APIs. The new “challenger banks” movement is an example where newer startups are offering captivating mobile services based on APIs while putting pressure on larger institutions to modernise their infrastructure and applications.
A closer look at the API characteristics
Generally speaking, a RESTful API request is a call to invoke a function. It includes the address of a specific resource (the endpoint) and the action you want to perform on that resource (method). A payload might be present to carry additional data and HTTP headers might be populated to add information about the origin of the call, what software is requesting data, requisite authentication credentials, etc. The method (or verb) expresses the action you want to perform, such as retrieve information (GET) or update information (POST).
It’s useful to understand the composition and origin of API traffic, such as the most commonly used methods, the most common protocol used to encode the payload, or what service generates traffic (like Web, mobile apps, or IoT). This information will help us identify the macro source of vulnerabilities and design and deploy the best tools to protect traffic.
The vast majority of API traffic is the result of POST or GET requests (98% of all requests). POST itself accounts for 53.4% of all requests and GET 44.4%. Generally speaking, GET tends to transfer sensitive data in the HTTP request header, query and in the response body, while POST typically transfers data in the request header and body. While many security tools apply to both of these types of calls, this distinction can be useful when deploying tools such as API Schema Validation (request and response) or Data Loss Prevention/Sensitive Data Detection (response), both launched by Cloudflare in March 2021.
Payload encoding review
API payloads encode data using different rules and languages that are commonly referred to as transport protocols. When looking at the breakdown between two of the most common protocols, JSON has by far the largest number of requests (~97%) while XML has a smaller share of requests as it still carries the heaviest traffic. In the following figure, JSON and XML are compared in terms of response sizes. XML is the most verbose protocol and the one handling the largest payloads while JSON is more compact and results in smaller payloads.
Since we have started supporting gRPC (September 2020), we have seen a steady increase in gRPC traffic and many customers we speak with are in the planning stages of migrating from JSON to gRPC, or designing translation layers at the edge from external JSON callers to internal gRPC services.
Source of API traffic
We can look at the HTTP request headers to better understand the origin and intended use of the API. The User-Agent header allows us to identify what type of client made the call, and we can divide it into three broader groups: “browser”, “non-browser” and “unknown” (which indicates that the User-Agent header was not set).
About 38% of API calls are made by browsers as part of a web application built on top of backend APIs. Here, the browser loads an HTML page and populates dynamic fields by generating AJAX API calls against the backend service. This paradigm has become the de-facto standard as it provides an effective way to build dynamic yet flexible Web applications.
The next 56% comes from non-browsers, including mobile apps and IoT devices with a long tail of different types (wearables, connected sport equipment, gaming platforms and more). Finally, approximately 6% are “unknown” and since well-behaving browsers and tools like curl send a User-Agent by default, one could attribute much of this unknown to programmatic or automated tools, some of which could be malicious.
A key aspect of securing APIs against snooping and tampering is encrypting the session. Clients use SSL/TLS to authenticate the server they are connecting with, for example, by making sure it is truly their cryptocurrency vendor. The benefit of transport layer encryption is that after handshaking, all application protocol bytes are encrypted, providing both confidentiality and integrity assurances.
Cloudflare launched the latest version of TLS (v1.3) in September 2016, and it was enabled by default on some properties in May 2018. When looking at API traffic today, TLS v1.3 is the most adopted protocol with 55.9% of traffic using it. The vulnerable v1.0 and v1.1 were deprecated in March 2021 and their use has virtually disappeared.
|Transport security protocol||December 2021|
The protocol that is growing fastest is QUIC. While QUIC can be used to carry many types of application protocols, Cloudflare has so far focused on HTTP/3, the mapping of HTTP over IETF QUIC. We started supporting draft versions of QUIC in 2018 and when QUIC version 1 was published as RFC 9000 in May 2021, we enabled it for everyone the next day. QUIC uses the TLS 1.3 handshake but has its own mechanism for protecting and securing packets. Looking at HTTP-based API traffic, we see HTTP/3 going from less than 3% in early February 2021 to more than 8% in December 2021. This growth broadly aligns RFC 9000 being published and during the periodHTTP/3 support being stabilized and enabled in a range of client implementations.
Mutual TLS, which is often used for mobile or IoT devices, accounts for 0.3% of total API traffic. Since we released the first version of mTLS in 2017 we’ve seen a growing number of inquiries from users across all Cloudflare plans, as we have recently made it easier for customers to start using mTLS with Cloudflare API Shield. Customers can now use Cloudflare dashboard to issue and manage certificates with one-click avoiding all the complexity of having to manage a Private Key Infrastructure and root certificates themselves.
Finally, unencrypted traffic can provide a great opportunity for attackers to access plain communications. The total unencrypted API traffic dropped from 4.6% of total requests in early 2021 to 2.6% in December 2021. This represents a significant step forward in establishing basic security for all API connections.
Given the huge amount of traffic that Cloudflare handles every second, we can look for trends in blocked traffic and identify common patterns in threats or attacks.
When looking at the Cloudflare security systems, an HTML request is twice as likely to be blocked than an API request. Successful response codes (200, 201, 301 and 302) account for 91% of HTML and 97% of API requests, while 4XX error codes (like 400, 403, 404) are generated for 2.8% of API calls as opposed to 7% of HTML. Calls returning 5XXs codes (such as Internal Server Error, Bad Gateway, Service Unavailable) are almost nonexistent for APIs (less than 0.2% of calls) while are almost 2% of requests for HTML.
The relatively larger volume of unmitigated API requests can be explained by the automated nature of APIs, for example more API calls are generated in order to render a page that would require a single HTML request. Malicious or malformed requests are therefore diluted in a larger volume of calls generated by well-behaving automated systems.
We can further analyse the frequency of specific error codes to get a sense of what the most frequent malformed (and possibly malicious) requests are. In the following figure, we plot the share of a particular error code when compared to all 4XXs.
We can identify three groups of issues all equally likely (excluding the more obvious “404 Not Found” case): “400 Bad Request” (like malformed, invalid request), “429 Too Many Requests” (“Rate Limiting”), and the combination of Authentication and Authorization issues (“403 Forbidden” and “401 Unauthorized”). Those codes are followed by a long tail of other errors, including “422 Unprocessable Entity”, “409 Conflict”, and “402 Payment Required”.
This analysis confirms that the most common attacks rely on sending non-compliant requests, brute force efforts (24% of generated 4XXs are related to rate limiting), and accessing resources with invalid authentication or permission.
We can further analyse the reason why calls were blocked (especially relative to the 400s codes) by looking at what triggered the Cloudflare WAF. The OWASP and the Cloudflare Managed Ruleset are tools that scan incoming traffic looking for fingerprints of known vulnerabilities (such as SQLi, XSS, etc.) and they can provide context on what attack was detected.
A portion of the blocked traffic has triggered a managed rule for which we can identify the threat category. Although a malicious request can match multiple categories, the WAF assigns it to the first threat that is identified. User-Agent anomaly is the most common reason why traffic is blocked. This is usually triggered by the lack of or by a malformed User-Agent header, capturing requests that do not provide enough credible information on what type of client has sent the request. The next most common threat is cross-site scripting. After these two categories, there is a long tail of other anomalies that were identified.
More than one out of two requests we process is an API call, and industries such as Banking, Retail and Financial Services are leading in terms of adoption and growth.
Furthermore, API calls are growing twice as fast as HTML traffic, making it an ideal candidate for new security solutions aimed at protecting customer data.