Since my inaugural blog post last week on the Hack for CF hackathon (RSVP today!) I have been thinking about grand challenges and specifically grand challenges where data science can be a critical catalyst. My first instinct was to reach out to the 750+ members of @DataScienceATL and see what they thought were the most ambitious challenges before the human race and how data science could play a central role.
The feedback was surprising and only encouraged me to dive deeper. I’ll have more on the group’s feedback in a future blog post.
The great thing about data science is that it is a meritocracy unparalleled in human history that can flourish anywhere two or more smart people congregate physically (See @DataScienceATL) or virtually (see Github). The applicability of the insights from data science shared widely can be applied instantly and everywhere causing all boats to rise and inspiring new voyages to embark on. The journey is never over. And that’s a good thing.
The future is wide open and nowhere more so than in the universe of data science. Data science is an inherently collaborative discipline requiring such a broad yet focused set of skills that no individual can do it alone. It takes a village to make advances in data science. This is also a good thing.
This begs the question, what exactly is data science?
Honestly, I can’t tell you exactly. But, I can share something that gives an idea of what it is. A data scientist is a better hacker than any of their statistician peers and a better statistician than any of their hacker peers. A chief data scientist has a very interesting role that involves both of these skills among many others in part but first and foremost is chief recruiter to build a team that satisfies all critical areas of data science. Dominant companies that depend on data science will likely have multiple data science teams as they begin to scale and fully take advantage of their insights.
This collaboration requirement goes beyond a given team in a given company but extends within and across industries. There isn’t a data science industry per se. Rather, there exists the practice of data science. Data science is applicable to all industries and the winners will embrace it, invest in it, and dominate because of it. Data science like computer science is largely a human capital play. You can’t just buy a data science patent and wrap a killer enterprise sales team around it. Perhaps this is necessary for success but certainly not sufficient.
The only way to “solve” a grand challenge in data science is collaboration at a scale unprecedented in human history while incorporating hyper-specialization. Open source software is to computer science as open source collaboration is to data science. One common source of this collaboration is the panoply of PhD theses that are public domain. Again this is necessary but not sufficient.
At ProductCamp ATL 2013 @KylePorter of SalesLoft shared an insight that once he started sharing secrets and insights with customers they began building a following of fanatic subscription customers. This is an insight that other CEOs can learn from. I know I did. Once you start giving away your crown jewels you begin to engage customers in a way that just wasn’t possible before. The best way to differentiate your company is to get the smartest and hardest working team you can possibly assemble and get to hustling. This is even more critical for data science.
We need more companies to provide this kind of thought leadership. Google publishes original research that is core to their business (or at least was at one time) and nearly every “big data” company you have heard of exists because of it. Google published a paper on MapReduce and the open source community with significant contributions from Yahoo! created Hadoop. Now Cloudera, Hortonworks, and MapR are building businesses based on Hadoop as a result of of that community effort. Google later published a paper on BigTable and now several “NoSQL” companies are competing to dominate the unstructured database market. Google will continue to publish original research and bright software entrepreneurs will capitalize on their proven insights but more companies need to follow suit.
I’m optimistic that companies publicly sharing original research will become commonplace mainly because so many chief data scientists come from academia and that’s in their nature. If CEOs don’t like it chief data scientists will start their own companies and will publish original research anyway. Facebook is experimenting with sharing original research that is fundamental to their business by championing the OpenCompute project.
Solving the grand challenges of the next decade will require everyone to share their crown jewels. Patent protection will not help solve grand challenges. People will solve grand challenges and chances are some of the most critical constituent members of these teams are randomly dispersed around the world and not necessarily in Silicon Valley. Certainly having concentrations of talent geographically helps. It helps a lot! It’s necessary but not sufficient.
Share now, share often, and shout it out from the rooftops or the digital equivalent! It’s the only way we’ll get to where we need to go.
What are the grand challenges over the next decade or next century that we should solve? What are you doing to get there?
J. Travis Turney