In the past few months, I think we've witnessed the end of the "Big Data" hype, and the rise of the "Data Science".
To me, this is the natural transition from buzzword to useful technology.
Today, I see more and more analysts turn their Excel workbooks into Jupyter notebooks, share data and insight within their company, for everyone to see and act upon.
To me, we're finally achieving the years-old dream of sharing data across organizations, even from teams that are seemingly unrelated. We're at the verge of synergy.
But this dream is still fragile and the biggest threat for all dreams is the same: disappointment.
To be used, a data analysis service should of course provide insightful data, but also be dependable, and that's where the problems start.
Software design and deployment is a craft, as much as data science, and rare are the ones that can excel at both.
Most people in the field, having to deal with complicated software stacks, spending their day within the command line, will be happy to setup their own solution to expose their latest machine learning service.
And that's perfectly fine and normal, because that's what makes most sense in today's world of tight R&D budgets, and because data analysts don't usually come from a CS curriculum.
But soon you end up with as many custom-built servers as you have teams within the organization, each implementing their own security model, authentication backends, mail notification, paging and load-balancing.
Add to the mix the high turnover in the industry, GitHub-stars-based platform selection, unpinned dependencies, ... good luck with keeping the services up and running in a few months.
Of course, in the corporate environment, IT services provide infrastructures and often help with deployment of business-produced applications. But the production of said applications is either:
To me, having so many business people writing their own services, automating their work, producing sophisticated machine learning algorithms, is an exciting opportunity for organizations.
But if they fail to deliver what can be expected from a professional service, it might quickly go to waste, to everyone's loss.
Basically, if you produce code for a living, then you should (learn to) code properly.
If you are assembling a team of data scientists, you might consider this setup:
You should be able to compose such a team with people you already work with, and quickly start shipping valuable services for your organization.
Eric is a software engineer, founder of Adimian, splitting his time between making his customers lives better, filling the paperworks and doing puzzles with his twin boys.