Case study
How to create your own content analytics system, when there is no similar product on the market?
Learn how you can implement a content analytics system to measure the performance of your published content.
About the company
The company is a universal publisher, one of the leaders of the book market in Europe.
The publisher’s strategy is increasing its net income from book sales in the online channel by optimizing investments in digital marketing.
The problem in question
The publisher understands the value of data that can help evaluate the effectiveness of the content on the website to create the demand at the top of the funnel to attract customers to the publisher's new products. This data is scattered and located in different systems. There was an attempt to combine this data using Power BI into a single dashboard. But since the infrastructure for collecting and processing data in a single repository of analytical data (Data Warehouse) was not provided and implemented at the design stage, the initially good idea did not translate into a reliable working tool.
The objective
To develop an analytics tool that will help analyze the website content ensure reliable operation at the same time.
The tasks
  1. To automate the collection and integration of the content data from CRM, Google Sheets and Google Analytics based on the published online materials in a single database.
  2. To visualize the content performance data in a dashboard format using Google Data Studio for quick and easy access.
The solution
Prior to starting this project, HOLOWAY had already done a thorough job of interviewing the internal analytics customers to better understand their business objectives and analytics needs.
Step 1. Export from CRM
A single upload of all content in the form of xls-files was made – news, articles, and tests. Further, a technical specification for the data automatic upload from CRM directly to Google BigQuery was developed and successfully implemented, and the data source was connected to the dashboard.

  • id - article id
  • url 
  • name - article title
  • active - if the article is active
  • sort - sorting in the catalog
  • date_change 
  • date_create 
  • publisher 
  • date_activity_start / date_activity_end 
  • isTest - if it is a testing article
  • author
  • date_publish 
Step 2. Export of the Publication Plan from Google Sheets
The original publication plan was transformed into a suitable format. The basic requirements are the following: we display the entire list of the publications on one tab. A separate table has been generated from the document. The data is pulled automatically as it is filled.

The detailed field structure:

  • date 
  • name 
  • type 
  • commentary 
  • keys 
  • traffic_plan 
  • goal 
  • responsible 
  • deadline 
  • customer 
  • editors 
  • link 
Step 3. Export of the Promotion Matrix from Google Sheets
The publisher used promotion matrices for internal reports. We were given access to them and asked for minor adjustments – to add a column with expenses for each book and to indicate the correct URL of the article where a link to this book was published. This is a temporary solution, since at the time of the prototype implementation, the article-book links were not implemented. In the future, it is planned to introduce a separate field with the book ISBN in CRM in the “Article” entity, so that the article-book link is even more reliable.

Separate tables are created based on each document in BigQuery. The data is pulled automatically as it is filled. The data from the matrices are combined with articles via a link to the article which is entered in the link_id column. For new entries and all entries starting from January 2020, all links are affixed.

The field structure:

  • date
  • name
  • author
  • isbn
  • link_itd
  • type
  • cost_plan
  • cost_fact
  • link_id
  • commentary
  • cost
Step 4. Export of the Publications Price-List from Google Sheets
The publishing house applies the principle of unit-economy: one division “pays” another for promotion of books and for the content created and posted on the website. These payments for the content services are called the Revenue. To calculate the Revenue for the articles published before January 2020 correctly, the price for publications was added. If the publication is of a certain type, does not belong to the editorial, was published before January 2020, and does not have the cost in the Promotion Matrix, we pull the Revenue from the price list by the type of publication. A separate price_sheet table was added to Google Big Query in the following form: 

  • plan_name – the type of the publication in the plan
  • matrix_name – the type of the publication in the promotion matrix
  • price – the price of the publication
Step 5. Export of data from Google Analytics
With a script developed specially on Google Apps Script, the data was uploaded from June 2019 to May 15, 2020. The data is stored in separate tables. The data is updated daily via Apps Script.

The structure of the data table on user taps in the articles:
  • date 
  • pagepath 
  • sessions 
  • pageviews

The structure of the data table for user visits to the website from the article page:
  • date 
  • landingPagePath 
  • sessions 
  • pageviews

The structure of the data table for transitions from the article page to the book card: 
  • date
  • goalCompletionLocation – the path to the page or a screen name corresponding to the achievement of any goal. Thus, we request the result of reaching the goal; it is different for each data table from GA. 
  • goalPreviousStep1 – achievement of the previous step of the goal
  • goal19Completions – transition to a book card

The structure of the data table for transitions from the article page to the author's card:
  • date
  • goalCompletionLocation 
  • goalPreviousStep1
  • goal15Completions 

The structure of the data table for transitions from the page of the article to the series.
  • date
  • goalCompletionLocation 
  • goalPreviousStep1
  • goal14Completions
The structure of the data table for transitions to the cart from the page of the article.
  • date
  • goalCompletionLocation 
  • goal4Completions
The structure of the data table for receiving revenue from the article page.
  • date
  • landingPagePath
  • transactions - транзакции
  • transactionRevenue
Step 6. Data processing and aggregation
We combine the data from the above sources into Google BigQuery which we use as DWH.
SQL query combines the data from all the above tables creating two entities as a result:
  • A cohort type data table where the data is arranged according to the following principle: the total of the indicators from the date of publication of the article to the date of the last data upload. 
  • A daily data table enabling one to view any indicators within the specified date range.
The estimated metrics from GA are updated daily at 8 am.
The results
Visualization of the received data in the dashboard format consisting of several sheets.
The first page of the dashboard “Publishing materials by publication date”
The specific feature is that the table displays all articles whose publication date falls within the selected range. 

The costs are considered only for those books that have links to the articles in the Promotion Matrices. Only those articles that have links to the articles in the Publication Plan are displayed on the dashboard. Not only the articles on the website are considered, but also the articles from other partner websites. For verification, a special block named "income from matrices" was created.  
The second page of the dashboard “Publishing materials by date of receiving traffic”
The specific feature is that this dashboard sheet displays the data broken down by the date when the traffic was received. 
The articles in the table are represented for the entire available retrospective period – the data since 2008 has been downloaded from CMS. The costs are calculated only for those books that have links to the articles in the Promotion Matrices.

The data is collected from GA, the Publication Plan, and the Matrices for three editorials.
At the bottom of the dashboard there is also a clickable link to its detailed architecture.
The HOLOWAY team helped the publisher to create its own content performance analytics by combining information from several sources in one dashboard. The data is collected from GA, the Publication Plan, and the Matrices for three editorials.

There is no similar system on the market, so we had to create a completely new product based on the wishes of the customer, identified during a detailed interview, and information that has already been collected earlier in the publisher’s systems.

The tasks of transferring the already collected data from one service to another were solved, Google Data Studio service was used for the data visualization. The dashboards are configured so that the customers receive daily data on the effectiveness of the content; for example, this morning one can see information on the materials uploaded yesterday and earlier, which allows them to respond quickly to the changes. The data in the table is represented for the period from June 1, 2019, to the preceeding day (“yesterday”).

All information can be viewed in the context of editions, authors, and types of materials. The data can be downloaded in the form of Google spreadsheets or Excel. 

Thus, the publisher has received a practical tool for the website to link publications with e-commerce transactions and to calculate the monetization of the content articles and improve the effectiveness of content marketing.

The publisher’s team made some important conclusions on the content materials: first of all, they determined the most viral topics that collect organic traffic for further replication. They were able to monitor the performance of the content team which has not decreased even during the period of work with an incomplete staff. They can provide their internal customers with instant analytics on the posted materials. And now they know based on figures that the content is a long-term marketing tool. So, if you create it with high quality and find the right topic, it will attract the right audience at no cost at all for several years in a row, like no advertisement can do.