Modern data management for VNR

VNR Verlag is one of the largest specialist publishers in Germany. VNR makes expert knowledge available to various target groups. Its portfolio includes specialist information in paper form, loose-leaf publications, trade journals, digital information on portals, newsletters and trader services, as well as continuing education and specialist exchange formats such as conferences, congresses and seminars.

Thanks to Serverless-IT and the advantages of the AWS-Cloud, VNR was able to automate standard IT tasks in the area of data management and reduce the associated costs. The VNR IT-experts can now focus on their core tasks in the area of machine learning.

Project Overview

Initial Situation

VNR has large amounts of customer transaction data hosted in an Oracle database. These should be extracted and transformed into data structures suitable for both data warehouse and machine learning purposes. VNR's machine learning experts should then be able to easily use this data for model development, training and retraining and provide an endpoint for the actual use of the data. In the previous customer environment, the infrastructure structure and data flows were not trivial to implement, which kept the machine learning experts from their actual tasks.

Solution

Arvato Systems implemented an automatic Extract-Transform-Load (ETL) pipeline with S3 events, Lambda and DynamoDB. Uploading a data chunk from the original Oracle database triggers the transformation process and enables massive parallelism. In several steps, the raw format is con-verted into the various target formats using transformation templates. The results are then im-ported into Redshift for data warehousing and a DynamoDB for machine learning purposes. The entire setup is scripted as infra-structure-as-code via CloudFormation and integrated into a CI/CD pipeline, including unit testing, deployment into a test environment, and finally into a production environment. With AWS Sa-gemaker and API Gateway, machine learning experts could easily integrate the S3 and DynamoDB data to create and train a model and host an endpoint for its use. An API gateway with a lambda backend provides API access to the endpoint.

Result

The new processes, especially ETL, are significantly faster, more efficient, more stable and easier to maintain. The processing costs are considerably lower than with a manually maintained execution environment. S3 and DynamoDB are also very cost-effective in terms of data size and save scaling of the data room while maintaining high availability and longevity. Sagemaker takes over the integration needs of data and machines for the training and hosting of machine learning processes and enables an easy entry for new members of VNR's ML team.

Used Services

Advantages

Time savings

Efficiency gain

Stability

Reliability

Cost reduction

Your Contact for Questions about Our Solutions

Alexander Mosch

Expert for the Digital Transformation of the Media Industry

Modern Data Management for VNR

Process data faster and use it more easily thanks to AWS services

Project Overview

Used Services

Advantages

Your Contact for Questions about Our Solutions