Hot on the heels of Snowflake’s blockbuster IPO comes the Data Cloud Summit 2020. The event includes guidance and commentary from Snowflake’s executive team and presentations by more than 25 partners and industry experts. There is even a celebrity headliner, The Daily Show’s Trevor Noah.
This whole-day event consists of over forty sessions, broken up into eight subject areas. In case you couldn’t attend, this post will cover the highlights. We’ll see how Snowflake’s executive team plans to transform the platform using the Data Cloud. Also, we will look at how existing customers are already leveraging the Data Cloud to drive successful BI transformations and satisfy their business needs.
Oh, and if you’re wondering about Trevor Noah, we’ll get to that as well.
The first thing which should call your attention is the total absence of the term “cloud-based data warehouse.” Snowflake has strategically rebranded its product as a “Data Cloud.” As attendees quickly come to realize, this is far from a marketing gimmick. The Data Cloud concept is precisely what Snowflake expects will differentiate its platform from the competition.
Snowflake defines “Data Cloud” as “an ecosystem of thousands of businesses and organizations connecting to not only their own data, but also connecting to each other by effortlessly sharing and consuming shared data and data services.” The Data Cloud is a confluence of Data Warehousing, Data Lake, Data Engineering, Data Science, and Data Application development across multiple cloud providers and regions from anywhere in the organization.
Have you ever watched a nature documentary and seen a snowflake take form? Hold on to that mental image.
With an industry-leading data warehouse at its core, Snowflake is now branching out its utility by allowing best-of-breed service providers to seamlessly integrate and form the Data Cloud. You can discover these services via Snowflake’s “Partner Connect.” Likewise, with premium market data, it’s instantly available through Snowflake’s zero-copy replication in the “Data Marketplace.”
With the Data Cloud, all snowflake accounts are part of a single data universe.
Can you see what this means for Snowflake’s customers?
Competitive Advantage as a Commodity
Economic theory suggests that businesses should focus on doing that which they could do better or cheaper than their competitors and rely on third parties for everything else. This is called a competitive advantage.
Imagine you are a Rent-A-Car company running analytics on Snowflake. Your competitive advantage lies in optimizing your fleet of automobiles for client demand across several geographic markets.
Suppose you decide that a predictive Machine Learning model can help you increase that competitive advantage even further. In that case, you won’t have to dedicate resources to create an ML platform and Operations infrastructure. A Snowflake-trusted partner like Dataiku or DataRobot can get you up and running in minutes because they have a competitive advantage in AI.
Would the latest data from the tourism industry help make that data model more accurate? Again, no need to dedicate resources for sourcing it and managing an external pipeline. Data Marketplace will make it instantly available as if it were a local table.
How about documentation and data governance in your growing DB landscape? Does it make sense to divert the attention of the BI team to this time-consuming enterprise when tools like SqlDBM can generate it automatically?
With Data Cloud, not only can Snowflake continue to focus on its competitive advantage — providing a zero-maintenance, high-performance data warehouse — it can allow you to focus on yours. By offloading auxiliary tasks to the vendors who do them best, your team can continue to concentrate on their competitive advantage.
Snowflake brings the concept of competitive advantage as a commodity full circle by allowing its customers to become part of the Data Marketplace. organizations can now sell their proprietary insights to the Snowflake community, leveraging the instant copy-less sharing and security features of the Data Cloud.
Get Ready for New Features
Christian Kleinerman, Snowflake’s SVP of Product took the audience through a lightning tour of all the new functionality being added to the core platform. Although some of these warranted their own in-depth sessions, here is a rundown of what is being rolled out in the coming months.
The days of maintaining tables of user-permitted values and joining them to transactional data in a weak attempt at security are behind us. Snowflake is finally integrating value-specific security directly into its roles.
Another security feature that will help organizations guard their data for security and compliance reasons is data masking. This feature is also role-based just like the one above. A customizable masking policy can be defined and applied to any column in a table or view.
This is the new look of the SQL Worksheets and is already available for preview across all Snowflake accounts. The new interface allows for column-based metadata previews of the information retrieved from any SELECT as well its visualization through charts and dashboards. Customers can now perform data discovery and profiling directly through Snowflake instead of having to connect through a third-party reporting tool.
This one generated tremendous buzz in the technical sessions. With Snowpark, Snowflake enables server-side programming language extensibility. Snowflake customers can now write complex programs in JAVA, SCALA, and Python, and execute them directly on Snowflake without relying on drivers, APIs, and external compilers.
With the tagging feature, Snowflake customers can now label their tables for purposes of data governance and security. This feature can be combined with the previously mentioned row-level security in order to apply permissions in bulk across multiple tables, schemas, and even clouds.
According to Christian, Snowflake is continuously working behind the scenes in order to improve the optimization features of its product. This results in lower wait times, and as you expect in a pay-per-use model, savings for its customers.
“I literally almost fell off my chair when I saw the performance comparison to SQL Server”
– Margaret Sherman, Sonos | Head of Data Strategy
There are specific improvements as well. Snowflake is rolling out search optimization improvements for strings and unclustered data. This will allow for faster analytics for users like data scientists who need to mine and parse such data.
Additionally, Snowflake announced a “Query Acceleration Service” which can automatically scale out large queries by breaking them down into smaller subqueries, each in their own compute cluster. This functionality is highly customizable and has been reported to reduce query duration by up to fifteen times.
Unstructured Data Support
Snowflake has already done a tremendous job with semi-structured data, so why not go all the way. Soon, users will be able to run queries against unstructured file metadata. For example, you might query how many image files are named with a certain keyword, or are of a certain type.
Good Methods Never Go Out of Style
Whether data is your product or merely drives your product strategy, the guiding principles of BI and Analytics have not changed. Although the presentations span eight different subject tracks, the message has been consistent through all of them: the fundamentals do not change, technology evolves to keep up with them.
Focus on CI/CD and Data Ops
Data Lakes are becoming a viable alternative to a more structured Data Warehouse approach due to falling costs of data storage and the rising potential of automated pipeline solutions like Informatica, Matillion, and Fivetran.
Presentations by Convoy Inc, Texas Mutual, and AthenaHealth all spoke about the transformative potential of automated ELT methodologies to ensure the capture of high-quality data. The ELT/Data Lake approach is especially useful for organizations with varied data streams instead of a centralized transaction model.
Breaking Down the Data Walls
Talks by industry giants such as Disney and Sainsburys focused mainly on the importance of data governance and transparency in delivering accurate insights to customers and business users. Snowflake’s copy-less data sharing makes the age-old wisdom of having a single source of truth possible.
Office Depot presented this idea very elegantly by breaking it down into three critical components at the core of their data mobilization journey:
- Data is for everyone – BI should not be the gatekeepers of data. They should be the facilitators.
- Provide self-service for all data types for all processes – following from the previous point: all data should be available to whoever can best leverage it—metadata, master data, logs, etc.
- Offer a single source of truth with governed KPIs – data is only as good as the governance built around it. As the sharing of data becomes prevalent, so should business glossaries and governed definitions.
Organizations across all sectors and industries realize that the traditional instruction-based approach to software is limiting. With Machine Learning, we can train a model to optimize for any input, not just pre-programmed inputs, to maximize the desired outcome — thus automating software development itself.
Thanks to the Data Cloud, Snowflake finds itself ahead of the curve of this growing demand. Talks by Logitech and HP help highlight the importance of moving beyond hindsight reporting and into predictive analytics.
For those who are ready to fully embrace this endeavor, presentations by Snowflake-trusted partners like Dataiku will help illustrate the full value proposition they can provide in this space. Remember the idea of competitive advantage? It can be both illuminating and reassuring to know that trusted-partner platforms can seamlessly integrate with Snowflake and provide out-of-the-box AI/ML models and the ML Ops capabilities to get them running smoothly.
Let’s Talk About Trevor
You’ve heard of The Daily Show, now check out its rival, The Data Show! Looks like Snowflake has decided to disrupt the comedy industry as well.
Comedian Trevor Noah, admittedly, does not know a whole lot about Business Intelligence. He knows comedy. Comedy helped him escape the poverty of the South African ghettos to become one of the most widely recognized stand-up comedians in the world. And what does a comedian do?
A comedian teases out the idiosyncrasies of human behavior. Comedians observe our shared experience of the world and discover the insights which most of us are entirely oblivious to. Insights, that when mentioned, are instantly familiar and get at the heart of human behavior. Ok, maybe he knows about Business Intelligence after all.
In his book Born a Crime: Stories From a South African Childhood, Trevor highlights many of the same insights as the technology leaders who also presented at Data Cloud Summit 2020.
Language brings with it an identity and a culture, or at least the perception of it. A shared language says “We’re the same.” A language barrier says “We’re different.”
How is that for a mission statement for the Data Governance team?
In that same book, Trevor unwittingly summarizes the central message of the Data Cloud Summit 2020:
We tell people to follow their dreams, but you can only dream of what you can imagine, and, depending on where you come from, your imagination can be quite limited.
With the Data Cloud, Snowflake makes it possible for companies to shed the ballast of traditional on-premise solutions and move to an ever-evolving cloud platform. But the Data Cloud can do so much more!
In the Data Cloud Summit 2020, Snowflake has shown its customers the full scope of what they can achieve if they dare to dream it.