Data Reporting and the Challenges of Growth

sovrnmarketing // March 3, 2017

Managing Global Round-the-Clock Platform Operations and Rapid Growth

Over the past several weeks, the real-time reporting publishers have come to expect from Sovrn has experienced delays. We believe strongly in transparency and continual learning & improving, so we thought we should provide a longer explanation of the reasons behind the delays and what we are doing to fix the underlying causes.

It is important to know that in spite of data delays there has been zero data loss due to the robust data protection infrastructure Sovrn has in place. All data is and has been intact. The core issues had to do with the processing of that data into publisher dashboards and reporting tables.

To understand the delays in data reporting, it is important to first outline the three tiers that comprise the Sovrn platform.

Tier 1

The delivery tier is a distributed “pod” architecture with points of presence close to the large population centers and demand (advertiser) buyer systems. Proximity to media centers such as New York, Los Angeles, Chicago, London, etc. enables extremely fast exchange auction mechanics, advertising creative delivery, and ultimately less latency for the reader of the site. With increased adoption of new auction dynamics like header bidding, the need to scale for high volume with predictable response times is becoming even more important.

Tier 2

The data tier includes the capability to capture, store, and then process data, at extremely high speed. Sovrn’s Publisher network exceeds 100,000 sites, where more than 1B people consume 90 billion pages of content every month. The systems required to process that volume of data are the same or very similar to large platform companies (think: Netflix, Spotify, Snap, Twitter). The reporting issues the publishers have seen were due to processing and network capacity bottlenecks at our primary data centers as data volumes increased in late January.

Tier 3

Finally, the analytics tier which includes individual publisher reports. The data tier feeds the analytics tier so as processing slowed and network capacity constrained, the result was publishers were impacted by delayed reporting in their dashboards and other reports.

Growth: The Best Kind of Challenge

Sovrn’s data processing layer starts with a data ingest queue, a centralized data store, and real-time and batch data processing pipelines to supply the analytics data. The infrastructure is designed to “scale-out” by adding additional servers as needed. What we discovered, was the data processing architecture we’d deployed made it difficult to perform critical systems maintenance while simultaneously running at the extreme volume we began to encounter in late January. Frankly, we were a victim of growth. We’ve seen an 8x increase in data volume due to a rapid expansion from primarily header bidding implementations but also from video and Signal adoption. When we attempted to perform regular system maintenance and adding additional servers, the infrastructure became unstable. This instability required several in-place upgrades, patching, and reconfigurations to recover.

We should reiterate that the ad delivery tier (Tier 1) was never impacted. Moreover, all data was intact. This was fundamentally a bottleneck in the data tier that impacted data processing and resulted in several weeks of slow populating dashboards and publisher reports.

Key Lessons

Sovrn is a 24×7, always on system. We operate in North America, Europe, and Asia. It is simply not possible to take the platform down for maintenance. This means that configurations can become out of date and experience instability due to lack of regular maintenance (patches, etc.).
The platform requires a more flexible architecture that provides flexible capacity and ability to remove or replace select components for maintenance without impacting overall processing.
The platform must be able to quickly and efficiently spike capacity to keep up with both current and backfill data processing. As volume grows, the processing capacity required to catch-up from data delays can increase dramatically, further slowing things down.
Our commitment to the publisher is to provide timely and transparent communication of causes, impacts, and responses when issues like the above occur.

What Publishers Can Expect Going Forward

Now that the systems are stabilized and reporting is current, we are moving to upgrade the processing capacity and guarantee data within an hour.

Each 3rd party component (MapR, Aerospike, etc.) will be certified by these third-party vendors and validated with quality assurance testing, acceptance testing, and parallel processing to confirm data validity. Phased deployments of these fixes, which allow for publisher feedback and rapid response during deployment, will be implemented.

We believe these investments will continue to provide publishers access to unique, high-value data that is delivered consistently, reliably and in real-time.

Cookie	Duration	Description
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	30 minutes	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
_auth	1 year	This cookie is set by Pinterest that collects statistical details to track the use of its services.
_pinterest_referrer	past	This cookie is set by Pinterest to track the use of its services.
_pinterest_sess	1 year	This cookie is set by Pinterest that collects statistical details to track the use of its services.
_routing_id	session	This cookie is set by Pinterest that collects statistical details to track the use of its services.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
language	session	This cookie is used to store the language preference of the user.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.

Cookie	Duration	Description
__hstc	1 year 24 days	This is the main cookie set by Hubspot, for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_52355958_2	1 minute	This cookie is set by Google and is used to distinguish users.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
hubspotutk	1 year 24 days	This cookie is used by HubSpot to keep track of the visitors to the website. This cookie is passed to Hubspot on form submission and used when deduplicating contacts.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_mkto_datetime	1 month	This cookie is set by Marketo.
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.