Problem
The FA has large amounts of data originating from a number of different sources which can be either internal or external to the organisation. Even small and medium amounts of data can be difficult to manage, both in terms of how it is physically stored and in terms of how it is analyzed. When data volumes get larger, the problems associated with the data can also get larger. The more data you have, the harder it can be to extract value from the data.
Solution
External and internal sources have been made available through API’s so that data can be transferred to cloud storage where the Google cloud storage infrastructure comes into its own. BigQuery was used for the data lake. In the data lake, information from various sources is collected and made available for the various analysis systems.
The external sources are made accessible by App Engine (GAE) and Cloud Functions. These processes first place the data in Cloud Datastore where it is converted by Dataflow for the Data lake. This process is orchestrated by Google Composer.
Data from the Data lake is transformed by Dataflow into Datastore tables. In this way the effectiveness of the APIs is maximised. Internal sources have been created with the Appmaker tool and Cloud Sql. To give the users the possibillity to run reports by themselves Data Studio is used.
G Suite has been used to unlock the Data lake by creating Google sheets which can be accessed by larger numbers of analysts.
Result
Analysts have been given an environment where they can analyze data that has been created from a combination of data sources. APIs were made available so that a third party could build the player selection application. Reports that were desired from the business could now be supplied on an ad hoc basis.