AI Data Engineering Data integration Events Innovation OpenAI

IT Services present at Big Data London event

26 September 2024

What is Big Data LDN?

Big Data LDN (London) is an annual event in London focused on the latest in big data and analytics. It features keynotes, workshops, and exhibits on topics like data management, AI, and machine learning, attracting professionals such as data scientists, IT leaders, and business executives. It’s a key opportunity for learning, networking, and exploring new technologies in the data industry. A small cohort of the IT services team attended the event.

View of the big data London event — View of the Big Data London event.

Members of IT services were invited to present their strategic views, direction and work currently being undertaken to build the universities data engineering capacity and capabilities. Invited to do so, as thought leaders, by two of IT services technologies partners in this space, Snowflake and DBT labs (the data build tool).

image of Tim Packwood and Julian Kobylarz presenting at big data London event — Presentation by IT services about the data engineering service design at the university.

The talk, titled “Building a data engineering capability: Unlocking the value in a universities data” focused on:

The importance of investing in modern scalable technical data platforms.
Positively challenging out-dated organisational understanding of modern data analytics and data warehousing.
Implementing new development practices and platforms that improve quality and delivery.
Developing IT services engineering culture away from traditional BI towards modern data engineering.
The challenges transforming organisational culture and process.

The presentation received positive feedback for its views on the modernisation of data engineering practice, through the adoption of best practices traditional found in software engineering.

What is the Snowflake data platform?

The Snowflake Data Platform is a cloud-based solution by Snowflake Inc., offering scalable data warehousing. Its features include separate compute and storage for cost efficiency, support for diverse data types, and a multi-cluster architecture for high concurrency. Additionally, it provides advanced data management, zero-copy cloning, secure data sharing, and automated scaling, along with robust security measures. Snowflake simplifies managing data infrastructure, focusing on data insights extraction. It’s the universities data warehousing and analytics solution.

What is DBT (the data build tool)?

DBT (Data Build Tool) is an open-source software tool that enables data analysts and engineers to transform data in their warehouse more effectively. It is specifically designed to facilitate the workflow of transforming data with SQL – making it cleaner, more efficient, and easier to manage. DBT does this by allowing users to write modular SQL queries, which it then runs in the correct order with the proper dependencies and provides testing, documentation, and version control capabilities.

The framework has transformed BI development into a more DevOps orientated set of practices, that improve the quality, the observability and the speed at which data engineering teams can work.

Reflections from the wider event

Self of IT services team attending big data London. — Members from across IT services attended the event.

Reflections from event from our agile coaching team – Jonty Lewis

Federico Frumento (Vodaphone) highlighted that GDPR was the stick that forced many organisations to take data governance more seriously. Now, GenAI is the carrot that can be used to encourage a data culture and investment in governance.

A huge variety of suppliers now offer natural language queries of data combined with documentation about the data too. This reduces the load on core data teams—it opens access to data queries for less technical users. For instance, one large language model can generate SQL based on a natural language query for people to use.

Regardless of team size, all data engineering teams seem to be oversubscribed. Hence, the use case of AI being a good fit here to alleviate the load.

Chris White’s talk (CTO at Prefect) on Resilient Data Pipelines was a highlight. He framed the whole talk by looking at lessons learned in the past, exploring the nature of complex systems, and emphasising that reliability is an emergent property—not something that happens instantly.

Reflections from event from our data engineering team – Max Williams

Unsurprisingly, AI loomed large at Big Data LDN 2024. Of the 272 lectures advertised over the two days, 96 (35%) explicitly mentioned AI in their titles. And many more discussed AI, with four whole lecture theatres being dedicated at least in part to talks on the subject.

One of the most exciting talks we saw in this space was actually a demo given at the Snowflake booth on using their AI tooling to let users query our data in natural language. Snowflake’s offer for integrating this functionality into our existing data warehouse appeared more developed than the examples we saw of other organisations who had built their own, bespoke language-to-code tools. This disparity affirmed the value of using existing tools before building our own and we’re excited to get stuck in and make this feature available to our data users at the University.

The other side of what we heard about AI’s value to data users was the increased velocity and broader insight it can bring to the work of data engineers themselves. A lecture on harnessing Snowflake’s AI platform for processing structured and unstructured data left me keen to broaden the use of AI in my own workflows.

Despite the interest these talks generated, our team generally found itself more drawn to talks on other, maybe more fundamental data engineering tools and processes. At times, these left us with more questions than answers. A talk on the ineluctable rise of the data lakehouse usefully highlighted its value for managing integrations, but never fully made clear the case for its inevitable architectural dominance.

Other talks brought valuable new insights to topics such as governance as code and designing resilient data pipelines. The latter took as its starting point the certainty that our pipelines will break before outlining tech-based solutions for resilience and failing gracefully.

Overall, as someone new to the field, it was the talks around data engineering fundamentals that captured me the most, and I left excited to embed these principles in my work. It was just a shame that we missed some great-looking talks due to capacity issues at the venue. I was also slightly puzzled by Thursday’s keynote, Data in Sport. While it was very cool to share a room with some national treasures, the speakers’ best efforts to pull in occasional data references couldn’t prevent the conversation from dissolving into a (nonetheless entertaining) discussion of their sporting experiences. It would be great to see Big Data London address these issues when we return next year.

What comes next?

Through platforms like Snowflake and the use of technologies like DBT, IT services now has a modern technical platform from which to help deliver the universities data strategy ambition, but thats the easy bit… building a data culture, data skills and ways of working at a large university will take a lot more time and effort.

If you’d like to know more about the current work and technology please reach out to Julian Kobylarz or Tim Packwood.

Useful Links

https://www.snowflake.com/en – Snowflake data platform website

https://www.getdbt.com – Data build tool website

https://www.youtube.com/c/Bigdataldn – Big Data London presentations

Authors

Jonty Lewis

IT Agile Coach

Max Williams

Data Engineering

Tim Packwood

IT Customer Engagement

Julian Kobylarz

BI and Data Engineering Manager