DeepField CEO and co-founder Dr. Craig Labovitz will speak at the Content Delivery Summit May 20, 2013 Net Futures conference on the future of the Internet and Cloud infrastructure.
DeepField CEO and co-founder Dr. Craig Labovitz will speak at the February 20-21, 2013 Net Futures conference on the future of the Internet and Cloud infrastructure.
Welcome to Cyber Monday!
Today is expected to be the biggest online shopping day ever.
Once a minor side show to brick and mortar Black Friday shopping, Cyber Monday has grown into a multi-billion dollar cyber bonanza for retailers. Without question, Cyber Monday is the most important online shopping day of the year.
With billions of dollars at stake, companies have raced to build out and promote their online shopping presence.
This blog post explores the rapidly changing online shopping landscape. In particular, we look across millions of Internet users and end devices (e.g. phones, mobile apps, web sites, etc.) to analyze which Internet sites and behind the scenes infrastructure lure the most traffic and visitors.
Over the last several months, we conducted a large-scale study of online shopping infrastructure. As in other research reports, we use data from an ongoing large scale study of Internet backbone traffic across a large cross section of North America and multiple collaborating infrastructure and Internet providers (although based on a different dataset, more information about our basic methodology is available here). We believe this is the largest ongoing study of its kind.
The below chart shows the results of our study. The graphic lists some of the top Internet retailers and shopping sites along with the average percentage of Internet users that interact with each site either directly or through third-party sites or back-end infrastructure.
Amazon tops the list with an amazing 14% of all Internet users interacting with Amazon managed shopping sites every day. This includes www.amazon.com as well as a growing empire of Amazon owned branded sites such as www.myhabit.com. For the purposes of this report, we include all amazon owned retail sites under amazon.com with the exceptions of zappos.com and quidsi.com which are large enough to warrant their own entries. The Amazon entry does not include AWS or other Amazon advertising and cloud infrastructure.
What is truly impressive is how much larger Amazon shopping is compared to any other other online site. Amazon is almost double the next largest shopping competitor, Ebay, which enjoys 8.8% of daily Internet users.
While most of the Cyber Monday media attention (including this Wall Street Journal article) focuses on the big names in retail, Cyber Monday also represents a major portion of the yearly sales for tens of thousands of smaller web sites. To demonstrate the growing power this market segment, we we include e-commerce hosting sites like www.shopify.com in the above chart. With an impressive 5.4% market share of daily users, Shopify provides a complete web storefront platform for more than 30,000 sites. As in the brick and mortar world, online small businesses account for a major portion of the US e-retail economy.
While not a household name (or at least no one in my family recognized the name), sites like quidsi.com represent new players that have quickly amassed web conglomerates encompassing dozens or hundreds of smaller sites. The Quidsi empire (ranked #6 at 3.1% of daily Internet users) includes dozens of popular shopping sites such as soap.com (where you can literally buy soap as well as hundreds of other daily sundries), diapers.com (baby products), and yoyo.com (toys). Note that Quidsi was acquired by Amazon for a whopping $540 million in 2010.
Our list also includes web shopping comparison sites like shopzilla.com at nearly 3% of all Internet daily users. Originally a failed Wharton student business plan competition entry, Shopzilla (and similar sites like pricegrabber.com) have grown to direct millions of daily online shoppers to the best deals in other online shopping sites.
For larger retailers, Cyber Monday represents nothing short of an all out price war. Struggling electronics retailers like bestbuy.com have slashed prices and spent millions on promoting their holiday deals.
The below graphs shows some of the tangible results of this online war. In the days following Thanksgiving, BestBuy traffic grew from an average of 1.5% of Internet users to more than 5%.
With that, I’ll wrap up my brief tour of Cyber Monday shopping. I’m off to try and find deals on a Nintendo D3 for my son. Anyone have suggestions?
With the breaking news of Priceline’s astounding $1.8 billion acquisition of Kayak, I got to wondering how much are travel sites worth?
Is Kayak really nearly twice as valuable as say, an Instagram?
The below graph shows how Kayak stacks up in popularity against other large travel sites. As in other blog posts, we use data from an ongoing large scale study of Internet backbone traffic across a large cross section of North America and multiple collaborating infrastructure and Internet providers (although based on a different dataset, more information about our basic methodology is available here).
The second column shows percentage of unique Internet users visiting a site or related third-party content at least once a day on average. The third, “percent of traffic”, column provides the daily percent of global Internet traffic to the sites and the last column shows 10 day patterns in the percentage of unique Internet visitors.
Based on data across several million Internet users, Expedia dominates the travel industry with 1.5% of all Internet users visiting at least once a day. At just over 1%, Kayak is second followed by HotWire at a distant third.
Oh, and in case you were wondering, Instagram averages a massive 9.5% of Internet users a day and a thousand-fold the volume of traffic.
Apple released the latest version of its eagerly anticipated IOS update Wednesday. In the minutes immediately following the 1pm EST release, massive numbers of iDevice users clearly raced to download the update.
The below graph shows normalized iTunes traffic across a random sample of several North American Internet providers over the last week. On Wednesday, iTunes backbone traffic spiked to consume an amazing average 7-12 percent of backbone traffic. This iTunes surge is roughly equivalent to abruptly switching on a new Internet service on the scale of YouTube or Netflix. The numbers are even larger if we just look at traffic consumer Internet providers.
Most of the update traffic came from edge CDN infrastructure or direct peering with CDN distribution infrastructure (mainly Akamai). iTunes traffic volumes remained elevated through late Thursday night (EST).
In many Internet backbones, the IOS6 release traffic spike handily outpaced the surges seen during previous IOS updates. Mostly, the millions of downloads appear to have gone without incident. But in a few networks, the IOS traffic flood overwhelmed backbone circuits leading to brief outages and periods of degraded Internet performance.
Dr. Labovitz presented his latest research into the application of cloud technologies to real-time network and cloud telemetry. The workshop gathered some of the leading experts in data science to explore cutting edge ideas on big data analytics and computer science.
Earlier this week, Netflix announced plans to deploy their own caching servers in consumer access provider networks around the world. As analyst Dan Rayburn observed, “Netflix aims to lower their CDN costs, rely less on third party CDNs, provide higher quality streaming and most importantly, give network operators more control over the video that flows through their pipes.”
Within the industry, Netflix’s announcement Monday was old news. Netflix has been privately and publicly discussing their caching plans for the better part of a year (e.g. see this April Netflix presentation at RIPE). By the start of this week, almost every North American provider of even modest size had entertained multiple discussions with Netflix about deploying their caches. From a DeepField commercial and research perspective, we’ve enjoyed a front-row seat watching significant Netflix traffic volumes migrate from CDN to Netflix dedicated infrastructure over the last six months.
Apparently, though, Netflix’s caching plan was a surprise to the market which reacted to Netflix’s announcement by abruptly punishing CDN stocks.
In addition to a somewhat peculiar disregard of previously publicly available information, the market also seems to have significant misconceptions about Netflix’s cache impact on the broader CDN ecosystem.
As other analysts have written (e.g. again see Dan Rayburn’s post), Netflix’s caching move likely will not have a significant, long-term impact on the CDN market. Mainly, over the last several years low cost entrants have driven down pricing and commoditized video delivery — the margins for CDNs today on bulk video are terrible. As the chief driver of peak hour bandwidth (along with negotiated bottom of the barrel pricing), Netflix exacerbated the margin issue. Most CDNs would rather focus on higher value (and more profitable) services like analytics, acceleration, and DRM (e.g. see recent Limelight product announcements).
Overall, Netflix follows a trend amongst the select few “hyper giants” to gain market efficiencies through controlling or building their own low-level infrastructure. But, only a handful of companies have the market need and resources to build their own servers and datacenters (e.g. Google and Facebook) or deploy their own caching infrastructure (Netflix). For all other companies, it makes far more sense to use cost effective third-party services like the established CDNs.
Though, I’ll also observe that Netflix is not alone. At last count, on the order of 30 different content companies and CDNs were aggressively promoting their own hardware for deployment in provider networks. In many ways, the edge caching market resembles the early period of Internet peering. The heady days of open interconnection policies (circa 1996) were quickly replaced by ISPs employing objective (well, somewhat) metrics on settlement-free peering. In later years, only carriers or content providers of sufficient scale and economic benefit met these “free” criteria. We are now seeing the same sort of criteria emerge around provider decisions to deploy third-party caching hardware within their networks.
With all of the above said, it is worth looking at the CDN market in North America today. The below graph shows a break-down of CDN by percentage of aggregate subscriber traffic volume during the month of April 2012. As noted earlier, traffic volume provides limited insight into profitability of a CDN or the distribution of customers.
The graph uses data from an ongoing research collaboration with multiple large North American Internet providers. We analyze anonymized backbone data encompassing a geographically diverse set of several million subscribers (see earlier blog posts for more details). We believe this collaboration represents the largest ongoing study of cloud and Internet evolution.
Overall, the “big three” (Akamai, Level3 and Limelight) dominate Internet CDN traffic volumes. All other CDNs combined represent less than 10% of the traffic volume and typically focus on market niches such as gaming or low-cost bulk file updates. The exact percentages can vary by several points depending on timeframe and whether you look at things like percentage at peak time or overall average.
Netflix started the year more or less evenly distributing the traffic across the three big CDNs with a trickle of DRM / management traffic also going to Amazon’s EC2 cloud. In the below graphic, we visualize average Netflix traffic across several North American providers. The width of the flow corresponds to the relative percentage of traffic and the color either video traffic (green) or DRM / control (red). Again, I note the exact percentages vary across different networks and time scales.
By the time of Netflix’s cache announcement this week, you can see the significant change in average video traffic distribution across these networks. Now 70% of Netflix traffic on average comes directly from Netflix distributed caches or other infrastructure with Netflix datacenters (AS2906).
I emphasize that Netflix traffic patterns vary significantly across providers. In some networks, the Netflix cache transition appears nearly complete while in others not yet started. The below visualization shows several networks where the migration has already occurred (or is ongoing) and does not represent a synopsis view of all Netflix North American traffic.
So what does this all mean?
As I’ve observed in earlier work, the Internet is in the midst of a fundamental shift from connectivity to content. In the past, carriers saw their role as delivering arbitrary bits between their customers and many millions of web sites. Today, most customers care about a shrinking number of video, cloud and content sources. Our most recent data finds that more than 70% of all Internet traffic (on average) comes from just 150 CDN, hosting, cloud and content companies.
At the same time the number of content sources is shrinking, the volume of “hyper-giant” traffic is growing astronomically (especially HD video). The Internet simply can not cost effectively meet these burgeoning traffic demands without additional growth in CDN infrastructure and embedding additional server capacity and content directly into the last-mile network.
This hyper-giant content evolution has changed the way Internet and content providers build their networks and monetize their infrastructure. This is a good thing for the market and consumers (I will save a more detailed discussion on the nuances of these benefits for a later blog post). While we will continue to see disintermediation in the market as “hyper giant” companies like Netflix pursue direct relationships with subscriber networks, I also expect the CDNs to play a significant and growing role in the Internet / cloud evolution.
In honor of Facebook’s recent IPO, we present a brief blog post asking the question how big is Facebook?
At a staggering $100 billion dollar valuation and reported 900 million users, Facebook is clearly a massive player in the market and Internet economy. From an Internet infrastructure perspective, Facebook also ranks amongst the largest of the “hyper giants” generating a significant share global Internet traffic.
With a PE ratio of 95:1, Facebook is also an incredibly expensive stock.
This blog post attempts to put some concrete numbers behind Facebook’s enormous Internet presence and evaluate the company’s valuation in terms of its Internet traffic contribution.
As in previous posts, we use data from an ongoing research collaboration with multiple large North American Internet providers. We analyze anonymized backbone data encompassing a geographically diverse set of several million subscribers. More details on the research methodologies used in our prior work is available here. We believe this is the largest ongoing study of its kind.
On average, our analysis finds Facebook contributes nearly one percent of all Internet traffic (the actual number is 0.75%). This includes traffic both to Facebook’s private datacenters as well as third-party edge CDN caches (over 85% of Facebook traffic relies on CDNs).
While one percent is an awesomely huge number, the really, really impressive statistic is 45%. More specifically, we estimate 45% of all Internet subscribers send traffic to Facebook servers at least once every day. This includes traffic sent directly to www.facebook.com as well as the indirect connections made by tens of thousands of third-party web sites that include Facebook content or APIs (most users are likely unaware of all the traffic their browsers send to Facebook when they visit these third-party sites).
Given our estimate of Facebook’s traffic volume and today’s IPO price, we now calculate the first ever estimate of the value of Facebook’s Internet traffic.
We first use data from Cisco to estimate the overall size of Internet traffic (37 Exabytes per month). At 0.75% and a $104.2B valuation, this means that Facebook uses 824,000 Mbps of bandwidth continuously. When you put their valuation in terms of this bandwidth you get a staggering $124,000 per Mbps. (Amazing, considering that over the course of each day there’s over 9,000,000,000 Megabytes delivered.)
Overall, an amazing company with an equally astounding market valuation and now very, very valuable traffic.
In recent years, Amazon has become nearly synonymous with cloud. Hundreds of major Internet services like DropBox, Netflix, and Instagram leverage AWS for all or major portions of their infrastructure. The Amazon cloud is so important that outages make the cover of the Wall Street Journal.
In this blog, we ask the question how big is Amazon’s cloud?
Amazon is clearly big.
For example, Amazon’s press releases like to tout highly technical statistics like the rate of S3 requests (650,000 per second!) and number of objects (900 billion!). Somewhat more insightfully, outside analysts’s have estimated the number of servers and revenue ($200 million in 2010). But none of this really gives a picture of Amazon’s growing role underpinning the global Internet economy.
So with collaboration of several network provider research parters, we conducted one of the largest studies of its kind analyzing multiple weeks worth of network data to AWS from a broad cross section of a several million Internet end-users (mainly in North America). Our goal was to characterize AWS traffic, understand the major companies using AWS infrastructure, and ultimately gauge the importance of AWS to the Internet infrastructure and daily services / browsing of end-users.
The below chart summarizes some of our key findings.
One way to gauge the importance of Amazon is to ask how frequently will a typically Internet user visit a web site based on Amazon infrastructure? The answer: an amazing 1/3 of all users every day. This number is all the more impressive when you consider that our data includes millions of users and end devices of limited scope or activities, such as users who only check mail and home game consoles.
[Note: Since our study focused on subscriber traffic, we excluded servers (such as consumers hosting web sites) and Internet "background noise" including the nearly constant barrage of scanning / intrusion attempts from China, botnets, machine-to-machine communication for software updates, etc. Though a different dataset, our earlier academic papers provide more background on related methodology].
Traffic volume provides another metric, albeit indirect, of Amazon’s growing Internet presence. As of April 2012, Amazon contributes more than one percent of all consumer Internet traffic in North America. This is a huge number given that Amazon, unlike, say Google, does not typically host massive video content. Instead, this one percent represents the broad reach of Amazon infrastructure across hundreds of client companies. By comparison, we found all of Google’s sprawling YouTube infrastructure contributed six percent of Internet traffic in 2010.
Finally, we looked at Amazon’s growing content distribution network (CDN). Over the last several years, CDNs have evolved as the workhorse of the Internet, delivering the majority of images, video and other content to end users. Since its launch in 2008, Amazon’s CloudFront CDN and S3 distributed storage services have steadily gained in popularity. As of today, Amazon ranks as the fourth largest CDN by traffic volume (trailing behind Akamai, Limelight and Level3).
Now on to our final question: what companies are using Amazon cloud infrastructure?
In the below table, we show the 40 largest corporate users of Amazon’s cloud infrastructure (contact us for a complete list).
As an estimate of the importance of AWS to each company, we calculated the average percentage of all subscriber AWS connections that access one or more of each site’s AWS components each day. So, for example, in the top spot, 21% of subscriber connections to AWS go to truste.com. Like many of the top AWS corporate users, truste.com is an advertising / analytics company (as is InviteMedia, Chartbeat, Evidon, etc.). Although most consumers remain blissfully unaware, almost every web page they visit is tracked, analyzed and scored by dozens of analytics and marketing companies (a large number of them using Amazon infrastructure).
Many of the companies above are familiar consumer names like DropBox, Netflix, Instagram and Pintrest. Others, like Heroku, provide behind the scenes platform as a service (PaaS) to hundreds of other companies running cloud applications. And still many other companies (including my own), use Amazon infrastructure for their internal enterprise applications and back-office support.
Overall, Amazon enjoys a commanding lead in the much balleyhooed, mind-blowingly large $200 billion anticipated cloud computing market. But the war for cloud dominance is just beginning. Companies like Rackspace, CSC, Microsoft and Google are investing billions in datacenters and software to compete. In upcoming blogs, we’ll explore the infrastructure and Internet footprint of some of these other large cloud players.
Once, a long, long time ago, P2P threatened the foundation of the Internet (this was way back in 2007). The seemingly insatiable growth of P2P video and music downloads clogged backbones (30-50% of Internet traffic in many networks) and overwhelmed international circuits as quickly as they were installed. News reports warned of the upcoming bandwidth apocalypse and the imminent collapse of the Internet.
But the collapse never came and then, strangely, things changed…
Well, really four things changed:
- Providers began throttling some consumer traffic
- Dramatically lower prices and the growing convenience of commercial content provided a legitimate alternative to P2P (e.g. iTunes, Netflix)
- RIAA court actions combined with growing user awareness of the susceptibility of P2P to monitoring had a chilling effect on P2P usage
- The rise of direct download sites (e.g. MegaUpload) provided a far more convenient and less legally risky alternative to P2P
As a result, by 2010 magazine and research reports heralded the death of P2P. From a high 40-50% of all Internet traffic, P2P had plummeted to 20-30% of Internet bandwidth and was still declining through the end of 2010.
But the Internet had not seen the last of P2P…
As a follow-on to our filesharing study earlier this year, we collaborated with several research partners to look at P2P traffic over a three month period. The dataset encompassed a sampling of a couple of million subscribers in North America.
While the size and duration of the study are limited, the early results suggest some interesting new trends in P2P traffic.
Of most interest, P2P is likely no longer declining. Over the three month period, P2P traffic levels remained mostly constant or exhibited a slight increase. In the below figure, we show filesharing as an average percentage of Internet traffic. Note that P2P contributed more than five times the volume of direct download filesharing traffic during our study. Overall, we estimate filesharing (both direct download and P2P) represents 15-20% of Internet traffic in North America today.
What explains the resurgence of P2P?
Simply put, changes in Internet technology, economics, and legal environment breathed new life into the aging protocol. A growing number of commercial services embraced P2P as a distribution mechanism for large content like game updates (e.g. World of Warcraft) and videos. CDNs like Akamai (via Red Swoosh acquisition) adopted P2P as part of their offerings.
On the legal side, the recent widely publicized MegaUpload takedown refocused attention on less centralized forms of filesharing (i.e. P2P). Similarly, improvements in P2P technology coupled with a growth in filesharing file size from content like Blue-Ray video also lead many users to revisit P2P.
But, potentially the most significant change to P2P is the cloud.
In 2007, P2P mostly meant traffic exchanged between between dorm rooms and consumer home PCs. Notably, this P2P traffic was slow, unreliable and susceptible to RIAA snooping and provider throttling. Basically, a pain to use.
But then a “cloud” varient of P2P evolved. Dozens of companies began offering hosted P2P “seed boxes”. For a low monthly price, users can purchase a fully pre-configured, high speed P2P box hosted in large datacenters.
The ads in the below graphic give a hint of the wide range of cloud P2P options and benefits.
Unlike dorm or home PCs, these seedboxes have high-speed GigE interfaces (great P2P ratios!) with no provider throttling, come with guarantees of anonymity and are conveniently located outside the US (and RIAA) jurisdiction. With a cloud seedbox, users are free to amass huge content libraries accessible via ftp or ssh at their convenience.
In a similar vein, hundreds of personal VPN companies sprang up offering to protect end-user activities form the prying eyes of their providers. Many of these companies, such as BTGuard and BolhVPN explicitly market their VPN as a secure P2P accelerator.
Like direct download, most P2P seedbox companies use a small number of datacenter providers, including OVH and LeaseWeb. In online forums, P2P users debate the relative merits of different seedbox hosting providers.
While exact volumes of seedbox P2P traffic are difficult to estimate, we do note that companies like OVH and Leaseweb consistently rank amongst the hypergiant top contributors of Internet traffic in the world.
DeepField president and co-founder Dr. Craig Labovitz gave a follow-up talk to his 2009 Hyper-Giants paper. This talk explored the other 50% of Internet traffic including the surprising CDN and hosting infrastructure behind most of the world’s file sharing and P2P traffic.
On January 18, 2012 global file sharing traffic collapsed. In a series of coordinate raids, US and New Zealand authorities seized thousands of MegaUpload servers and arrested its founder (at his own birthday party, no less).
As the largest file sharing service on the Internet, MegaUpload downloads represented 30-40% of all file sharing. In the space of an hour, Internet traffic globally plummeted by an astounding 2-3%. Press releases heralded a major blow to the theft of intellectual property.
So what happened to Internet file sharing traffic after the MegaUpload arrests?
First, some definitions. As the New York Times observed, “file sharing sites” (particularly those focused on distribution of copyright infringing content) can be difficult to distinguish from the dozens of legitimate sites helping enterprises and consumers share internal documents, homework, and the like.
The web sites for copyright protected and legal file sharing look nearly identical with similar graphics, sales messaging, and perhaps ironically (or cynically), DMCA policies and warnings against illegal file sharing. The only exception was MegaUpload which made little effort to disguise its true business focus (in retrospect, possibly a mistake).
But if you spend a few minutes searching to download the latest Hollywood movie release (or movies not even released yet), patterns quickly emerge. File sharing search sites like FilesTube, RapidManiac, and Filesbay link to many dozens of file sharing providers, but generally not, say, DropBox nor Box.net. (In the above example, I searched for “Man on a Ledge” — which you should not download from FilesTube if for no other reason than it’s a terrible movie).
In our study, we were particularly interested in the infrastructure behind file sharing, i.e. the hosting / colo facilities, payment partners, etc. The conventional wisdom is that file sharing is distributed across huge swaths of the Internet — basically everywhere.
In fact, though there are hundreds of file sharing sites, an extremely small number of colo-location providers (six of them) provide infrastructure to these sites that generate more than 80% of all Internet file sharing traffic. Like other niche industries, file sharing has evolved with a specialized ecosystem / cyber supply chain.
The below graph shows the Internet’s file sharing topology in the early hours of January 18, 2012. The links represent North America Internet file sharing traffic where the width of each link is proportional to the traffic volume. Green indicates traffic to the file sharing sites and red is traffic to the hosting or colo-location provider. Note that the different file sharing sites share much of the same Internet infrastructure and hosting companies (namely LeaseWeb, NForce, Carpathia, Choopa, and Softlayer).
On January 18, MegaVideo was clearly the king with 34% of all file sharing traffic. In turn, most MegaVideo servers leveraged US based servers in Carpathia Hosting with some traffic going to Leaseweb servers in the Netherlands and other European providers / facilities. According to the indictment, the gigantic MegaUpload sprawled over more than 1000 servers and 25 petabytes of data in Carpathia facilities (with another 700 MegaUpload servers in Leaseweb hosting centers).
The next graphic shows Internet file sharing traffic topology several hours later on January 19, 2012. Overall, a significant re-allocation of Internet file sharing traffic. MegaVideo is gone. Sites like PutLocker have gained significant marketshare.
The main impact of the MegaUpload takedown?
Well, file sharing has not gone away. It did not even decrease much in North America.
Mainly, file sharing became staggeringly less efficient. Instead of terabytes of North America MegaUpload traffic going to US servers, most file sharing traffic now comes from Europe over far more expensive transatlantic links.
We’ll be at NANOG 54 next month in San Diego.
I’m presenting data from a fun, side-project on the less well-publicized and understood side of Internet infrastructure. The working title: The Other Internet Infrastructure: File Sharing, P2P and Adult Traffic.
A bit from the abstract:
In previous work, we looked at the rapidly evolving “Hyper-Giants”, or the 150 large content and hosting networks (e.g. Facebook, Google, etc.) that now contribute an amazing 50% of all Internet traffic globally.
This talk looks at the other 50% of traffic.
Specifically, FileSharing, P2P and adult traffic represent a massive and growing portion of Internet traffic globally (as well as a sizable economic activity). Conventional wisdom holds that this set of “other” traffic permeates the Internet — tens of thousands of companies with servers in country, nook and cranny of the network.
But for the most part, we show this is not true. Instead, we find a couple hundred small companies quietly manage these thousands of domain names and an even smaller number of specialized hosting, CDN, analytics and advertisement companies provide the infrastructure. In the case of file sharing, we show that four small hosting companies provide the infrastructure that accounts for more than 80% of all file sharing traffic globally.