Scale teamed up with Fika Ventures to host the 2019 Data Summit. The event included nearly 100 Chief Information Officers, Chief Data Officers, and Chief Technology Officers, as well as founders from a select group of data-centric startups.
The agenda focused on the key challenges companies are facing in creating, managing, and leveraging data at scale. The event featured a mix of speakers, panels, and interactive discussions covering a range of timely topics: data science, infrastructure, tools, security, machine learning, recruiting, and more.
I wanted to share some of the highlights and insights from the event.
Democratizing Data Within Your Organization: Data Discovery
Deepak Tiwari, Head of Product Management, Platforms and Data, Lyft
“How do we democratize data within the organization?”
We started the summit with a presentation by Deepak Tiwari, Head of Product Management, Platforms and Data at Lyft. Under Deepak’s stewardship, data is central not only to product but increasingly to business users throughout the organization. The company is applying its expertise with real-time observability in the product to work towards capabilities that provide business metric observability in real time, for use cases like fraud detection. Deepak provided architecture diagrams and walked everyone through the thinking behind Lyft’s data systems. He outlined how they choose between open source, commercial products, and building internally.
Questions from the audience prompted discussions on why Lyft has a very high bar for build vs. buy decisions; how the company determines whether or not to open source its technologies; and the challenges of linking operational (transactions) and analytical databases.
Modern ML Platforms: Getting ML Models from the Lab Into Production
Kurt Smith, Manager, Styling Recommendations Algorithms, Stitch Fix
Josh Wills, Software Engineer, Slack
Franziska Bell, Director, Data Science, Data Science Platforms at Uber
Ariel Tseitlin, Partner, Scale Venture Partners (Moderator)
“Algorithms and machine learning really drive a lot of strategic and tactical decisions we make as a company.”
I was pleased to moderate a panel with three engineering leaders solving very distinct business challenges using machine learning. The discussion began with Fran Bell describing why Uber is leaning so heavily on ML to enhance customer and user experiences: it’s the right approach given the scale they operate at, it supports real-time operations, and it allows the company to understand its customers and markets at a global scale.
Kurt Smith from Stitch Fix and Josh Wills shared their own thinking about a key organizational question: whether to centralize or decentralize data science and ML engineering. The group discussed the advantages of hiring full-stack data scientists–offset by the difficulty in finding qualified candidates. One solution shared was internal training programs to up-level business analysts into ML roles.
Next came a rapid-fire discussion on the difficulties of getting ML models into production–judging from the groans, still very much an unsolved problem at most organizations. Talk turned to issues like how hard it is to manage and monitor test in production, the need to educate data science teams about production safeguards, and questions about feature engineering, optimization, what to track, and how to manage alerts.
The session concluded with questions covering the friction between traditional data science and engineering roles, business drivers behind developing end-to-end data and ML platforms (like Uber’s Michaelangelo), and creative ways to locate qualified talent (hint: upgrade your blog).
The Role of Smart Data
Japjit Tulsi, CTO, Carta
Alvina Antar, CIO, Zuora (Moderator)
“We’ve heard about the importance of data, data intelligence, machine learning. But what is smart data?”
The Data Summit continued with a moderated discussion on the topic of “smart data.” Alvina Antar, CIO at Zuora, guided Japjit Tulsi, the CTO of Carta, through a wide-ranging discussion on better ways to warehouse and leverage a company’s data resources. Drawing insights from his experiences at Microsoft, Google, and eBay, Japjit framed familiar enterprise use cases in terms of how data itself can be better managed to readily support new business initiatives.
By the end of the discussion, a working definition emerged for “smart data”: not simply data that supports reporting and analysis but data that contributes to understanding what areas deserve reporting and analysis. There are endless metrics and performance dimensions a company could analyze, smart data helps answer which one should be analyzed.
Five Keys to Digital Transformation
Scott Johnston, GM of Enterprise Solutions, Docker
TX Zhuo, General Partner, Fika Ventures (Moderator)
“Your open source community will take you places you never dreamed of.”
The Data Summit concluded with a fireside chat between Fika partner Tx Zhuo and industry veteran Scott Johnston of Docker. The discussion was framed around the history of Docker’s embrace of open source, diving into questions like how it decides which systems to open source and which to keep proprietary.
The audienced weighed in with their questions and comments around the proper pacing for transitioning business apps to the cloud, whether the security aspects of Docker containers should be open source, and how Docker allows organizations to restructure their data science and data engineering teams.
The formal program concluded with Scott’s session, just in time for everyone to unwind at a whisky tasting overlooking the Presidio and the Golden Gate Bridge in the distance. A fitting conclusion to an insightful and collegial event. I hope our guests found the day as informative as I did–and I look forward to doing this again soon.