Big Data Week in Atlanta 2014

Rhea Nara headshot

Data Science ATL is growing fast. Like 30+% quarter-over-quarter fast. As a result of this hyper-growth we have added a new member to our leadership team, Rhea Nara, to help find out what members want from us and to get the message out faster about all the great events we have planned for 2014. Outside of Data Science ATL Rhea works as a relationship manager at Emcien and we are delighted to have her join our team to better serve Atlanta’s data science community.

We are working hard as a team to bring our over 1,300 members valuable learning opportunities, business and social networking events, as well as job opportunities in data science. We recently hosted our 30th event since June 2012 and our Big Data Week Atlanta event will certainly be our best yet.

Big Data Week 2014Big Data Week is an internationally celebrated week long forum of independently hosted events in cities around the world. Big Data Week in Atlanta is back for the second year in a row after a very successful showing in 2013 of over 1,000 unique registrants, 22 events and more than 50 speakers. Data Science ATL is hosting our event for Big Data Week on Saturday May 10 at Georgia Tech Research Institute (GTRI) Conference Center as a part of the week of great events.

Paco Nathan headshot

Our keynote speaker, Paco Nathan, is a leading data science thought leader as well as a “player/coach” who’s led innovative data teams building large-scale apps for over ten years. According to his bio, “Paco is an expert in distributed systems, machine learning, and enterprise data workflows. He is also an O’Reilly author, engineering consultant, and an advisor for several firms including The Data Guild. Paco received his M.S. degree in computer science from Stanford University, and has 25+ years technology industry experience ranging from Bell Labs to early-stage start-ups.”

Paco will also be hosting data science workshop classes in Atlanta on the Monday (5/12) and Tuesday (5/13) after Big Data Week Atlanta. This will be a unique opportunity to learn from a talented Silicon Valley data scientist you won’t want to miss! More details of these workshop sessions will be communicated to the group directly as we get closer to our Big Data Week event.

Michael Schmidt headshotAfter the keynote we will present a panel of data science experts including Michael Schmidt, CEO, Nutonian. Data Science ATL’s very own Andrew Gardner, PhD will moderate the panel. According to Schmidt’s bio, “Michael’s research focuses on ‘Machine Science’ – a direction in artificial intelligence research to accelerate data-driven discovery. Over the past 6 years, he has worked on algorithms and techniques to automate knowledge discovery from data. In particular, he has published extensively on identifying mathematical relationships (such as laws of physics) in experimental data, and algorithms in evolutionary computation.”

Other panelists to be announced soon include data science experts in visualization, infrastructure, and academia. This will be the best slate of speakers in our two years of hosting great data science events for the Atlanta community. Please spread the word so we can be sure to have a huge turnout for these great minds in data science.

Lunch will be served before the event and we expect to fill the GTRI Conference Center to capacity so please RSVP today to ensure your spot!

Data Science for Social Good

Last year something incredible happened at the University of Chicago. Thirty-six aspiring data scientists spent their summer working in small teams on challenging real-world problems in education, health, energy, transportation, and more to directly benefit their local communities in need using data science techniques including data mining and machine learning. They applied their coding and analytics skills under guidance of mentors from industry, academia, and the chief data scientist from the 2012 Obama campaign, Rayid Ghani (@rayidghani).

These graduate and undergraduate Fellows came from quantitative and computational fields spanning computer science, statistics, and public policy. The results were simply amazing and changed the lives of the fellows forever as leaders in data science with the skills and passion to change their local communities for the better.

Watch this short video to meet some of the participants from last year and to see their unbridled enthusiasm for the Data Science for Social Good Summer Fellowship program.

This summer the Data Science for Social Good (DSSG) Summer Fellowship program is coming to Atlanta through the support of Georgia Tech and the City of Atlanta.

DSSG will be a landmark event for data science in Atlanta! Our hope is that Fellows, mentors, and project partners will be able to benefit from national exposure of their works through a demo day at the end of the summer and that the efforts of Fellows will be implemented by some of the project partners to benefit the Atlanta Metro Area. Also, our hope is that DSSG becomes an annual event going forward and builds into a premier showcase for Georgia Tech and Atlanta.

The long term benefit of a program like this to Atlanta is to foster a concentration of data scientists and data-savvy business and non-profit leaders to come to or remain in Atlanta. These leaders are likely to become professors, non-profit leaders, or start new data-centric businesses to grow and foster economic development in Atlanta.

If you are interested in applying to be a DSSG Fellow, mentor, project partner, or just to be updated about future events please go to DSSG-ATL.io to join our mailing list. We are working to update the page so you can apply directly through our website so look for an update in February.

Best,

Travis Turney (@travturn)

Cofounder, Data Science ATL

Grand Challenges

Since my inaugural blog post last week on the Hack for CF hackathon (RSVP today!) I have been thinking about grand challenges and specifically grand challenges where data science can be a critical catalyst. My first instinct was to reach out to the 750+ members of @DataScienceATL and see what they thought were the most ambitious challenges before the human race and how data science could play a central role.

The feedback was surprising and only encouraged me to dive deeper. I’ll have more on the group’s feedback in a future blog post.

The great thing about data science is that it is a meritocracy unparalleled in human history that can flourish anywhere two or more smart people congregate physically (See @DataScienceATL) or virtually (see Github). The applicability of the insights from data science shared widely can be applied instantly and everywhere causing all boats to rise and inspiring new voyages to embark on. The journey is never over. And that’s a good thing.

The future is wide open and nowhere more so than in the universe of data science. Data science is an inherently collaborative discipline requiring such a broad yet focused set of skills that no individual can do it alone. It takes a village to make advances in data science. This is also a good thing.

This begs the question, what exactly is data science?

Honestly, I can’t tell you exactly. But, I can share something that gives an idea of what it is. A data scientist is a better hacker than any of their statistician peers and a better statistician than any of their hacker peers. A chief data scientist has a very interesting role that involves both of these skills among many others in part but first and foremost is chief recruiter to build a team that satisfies all critical areas of data science. Dominant companies that depend on data science will likely have multiple data science teams as they begin to scale and fully take advantage of their insights.

This collaboration requirement goes beyond a given team in a given company but extends within and across industries. There isn’t a data science industry per se. Rather, there exists the practice of data science. Data science is applicable to all industries and the winners will embrace it, invest in it, and dominate because of it. Data science like computer science is largely a human capital play. You can’t just buy a data science patent and wrap a killer enterprise sales team around it. Perhaps this is necessary for success but certainly not sufficient.

The only way to “solve” a grand challenge in data science is collaboration at a scale unprecedented in human history while incorporating hyper-specialization. Open source software is to computer science as open source collaboration is to data science. One common source of this collaboration is the panoply of PhD theses that are public domain. Again this is necessary but not sufficient.

At ProductCamp ATL 2013 @KylePorter of SalesLoft shared an insight that once he started sharing secrets and insights with customers they began building a following of fanatic subscription customers. This is an insight that other CEOs can learn from. I know I did. Once you start giving away your crown jewels you begin to engage customers in a way that just wasn’t possible before. The best way to differentiate your company is to get the smartest and hardest working team you can possibly assemble and get to hustling. This is even more critical for data science.

We need more companies to provide this kind of thought leadership. Google publishes original research that is core to their business (or at least was at one time) and nearly every “big data” company you have heard of exists because of it. Google published a paper on MapReduce and the open source community with significant contributions from Yahoo! created Hadoop. Now Cloudera, Hortonworks, and MapR are building businesses based on Hadoop as a result of of that community effort. Google later published a paper on BigTable and now several “NoSQL” companies are competing to dominate the unstructured database market. Google will continue to publish original research and bright software entrepreneurs will capitalize on their proven insights but more companies need to follow suit.

I’m optimistic that companies publicly sharing original research will become commonplace mainly because so many chief data scientists come from academia and that’s in their nature. If CEOs don’t like it chief data scientists will start their own companies and will publish original research anyway. Facebook is experimenting with sharing original research that is fundamental to their business by championing the OpenCompute project.

Solving the grand challenges of the next decade will require everyone to share their crown jewels. Patent protection will not help solve grand challenges. People will solve grand challenges and chances are some of the most critical constituent members of these teams are randomly dispersed around the world and not necessarily in Silicon Valley. Certainly having concentrations of talent geographically helps. It helps a lot! It’s necessary but not sufficient.

Share now, share often, and shout it out from the rooftops or the digital equivalent! It’s the only way we’ll get to where we need to go.

What are the grand challenges over the next decade or next century that we should solve? What are you doing to get there?

Best,

J. Travis Turney

Co-founder @DataScienceATL