Data Science Stack

Data Science Stack and Development Platforms in a Nutshell

Summary & Key Insights

The new data sciences – i.e., knowledge discovery from data involving statistics, data mining, machine learning, and predictive analytics – are powerful and replete with opportunity. Many such advances are enabling us to quantify what has always been “soft” and “squishy” about sales, marketing, psychology, human behavior, competitive market advantage and larger marketplace forces involving individual and eco-system choice, simply because we now have mountains of data about choices, mated to new analytics that learn from such data. The era of machines learning from data has begun, and as it progresses it will undoubtedly reach into every nook and cranny of our lives, throughout every industry, every business function and process. However, it also comes with a steep learning curve and the opportunity to get things wrong. More likely than not, adopters will find stories of pratfalls and setbacks before others recovered from first attempts. At this early stage of market development, you will find yourself using off-the-shelf offering (such as using Clarabridge to assess and improve the customer experience), while also using development tools for other uses (such as Apache Spark to reduce patient readmission rates and costs in healthcare delivery settings). This Strategic Perspective looks at what the data science stack is, where it fits, what business projects to target for custom development and why, what is emerging and changing about it, provides a decision framework for it, and where the its future value lies.

Perspective

The new analytics of data science are rapidly evolving in their ability to help people and companies more easily uncover secrets hiding in silos of data in the enterprise, use computers to find business insight without explicitly being old where to look, anticipate future outcomes based on historical data, forecast, optimize, explore options to make the best possible alterative choices.

Many of these advances are enabling us to quantify what has always been “soft” and “squishy” about sales, marketing, psychology, human behavior, competitive market advantage and larger marketplace forces involving individual and eco-system choice, simply because we now have mountains of data about choices, mated to new analytics that learn from such data. The era of machines learning from data has begun, and as it progresses it will undoubtedly reach into every nook and cranny of our lives, throughout every industry, every business function and process with as-yet new, even undreamed-of applications becoming a reality over the coming years.

Business competition requires the use of these new analytics capabilities. And that usage begins with posing questions about its business purposes and uses, including the following:

  • Can these help to influence customer choice and marketplace behavior?
  • Will they accelerate our response to customers?
  • Will they assist us in solving industry-breakthroughs providing competitive advantage?
  • Can they lead to new profit opportunity?
  • Will they aid optimizing our strategic plan?

Asking and prioritizing answers to these potential business uses will focus and inform all – from business leaders to business analysts and software development teams – about business priorities for the new analytics, while also guiding choices about technologies for delivering effective uses of the new data sciences that best serve the interests of the enterprise.   

Among Global 500 businesses, many of the new uses of data science applications are occurring at the front-line of the business, in sales and marketing where prospects, customers and the enterprise meet. Some of the new uses of the new data science focus on increasing revenue, more effective customer targeting, orders and retention. Below the global 500, the uses of data science are – until recently – more spotty, but are evolving and expected to follow similar adoption and usage patterns, at the front-line of business.

IT leaders are not waiting in line, they are testing the new data science by meeting it where it is today: in the hands of data scientists, data miners, and machine-learning programmers who are using the development platforms of the new data science stack. Large enterprises are hiring scarce data science talent and consultants who are using the development components of the data science stack to hand stitch together solutions that uncover, discover, and deliver new business insight and predictive business value.

Most, but not all, new machine learning algorithms are open-sourced, accelerating innovation that can only come from widespread community collaboration. Taking advantage of the innovative capabilities during this early stage of data science innovation requires the skill and experience of data scientists, data miners, and machine language programmers. We explain why below.

Figure 1: Data Science Technology StackResearch T3 Article - Figure One Data Science

Source: Saugatuck Technology, an ISG business

In addition to its core development foundations, new visual workbenches for the modeling of machine learning and data-algorithm workflows are enabling data scientists, data miners, programmers and knowledgeable business analysts to more rapidly uncover value, and build, test and deliver models that can then be turned into code for subsequent pilot testing and production uses. However, not all of the early-stage visual development workbench tools are alike, especially in their ability to use underlying machine-learning algorithms.

For example, rather than a plug-and-play approach that can use any data learning algorithm, the early versions of these visualization tools may exclude one or several critical classes of data learning algorithms. As a result, even if new visual workbench modeling tools accelerate the delivery of projects, they require navigation and use by data scientists and experienced programmers, who can hand select and stitch together multiple learning algorithms that will be tested, piloted and eventually put into production.

Despite requiring data scientists, some of the new visual workbenches are automating code building by delivering self-generated code for subsequent use as pre-programmed modules for machines to learn about data no matter where and no matter what format or type data takes. Unfortunately, even the self-generating code development tools are not yet testing for, nor self-selecting for the most appropriate learning algorithm(s) for a given class of data: which reinforces the need for data scientists and machine language programmers.

Figure 2: Development Platforms of the Data Science StackResearch T3 Article - Figure Two Data Science

Source: Saugatuck Technology, as ISG business

Even though current uses of the new data science rely on either guidance or projects delivered by data scientists and machine language programmers, the development tools are evolving to for use by more than just those people who have climbed its steep learning requirements, secrets and experiences.

Another entry in the data science stack is visualization tools used by business users. Many of these tools are bridge a gap in the market from the hyper-cool dashboards used for business intelligence (BI) deployment using spreadsheet and structured SQL data sources, to work with the new data science analytics and applications, while the new data science visualization tools are crossing into the gap for non-programmers.

Although the current crop of visualization tools have limitations, the BI-oriented tools and those growing up from data science visualizations will vie with each other to fill in and out the data science development stack. These evolved capabilities – some delivered as on-premises software but most as Cloud subscriptions – will further enable the uses of the new data science beyond the Global 1000, by business generalists, subject matter experts who are not computer programming and data science mavens, and for purposes not even dreamed about today.

Net Impact

The rapid evolution of components of the data science stack overwhelmingly favor using Cloud subscriptions rather than on-premises software and hardware that will be out of date next quarter. The insourced alternative involves hiring staff and using subscription services for the new data science stack to develop and operate. The outsourced alternatives for data science range from using outsourced / offshored custom development to augment / complement / staff internal projects.

If the business application is other than customer focused, or is focused on an industry-defining solution that may lead to competitive advantage, then custom developed solutions using the data science stack, data scientists, programmers and business analysts is the more sensible path. Even here, the development, testing and pilot stages are more likely to use Cloud subscriptions because a majority of the data science stack involves the use of Cloud-native services.

For smaller independent software vendors (ISVs) and startups looking to meet enterprise needs, the sage advice is your value will be in your talent. Remaining independent means focusing on an area not addressed by other vendors, is in demand by targeted enterprises, and delivers the best solution in its category.

For some master brand IT providers, the most obvious desire may be to own the market for the entire data science stack, from new algorithms all the way through visualization tools for business users and the full application stack. Rather than becoming defocused by the enormity of this task, it may be prudent for large master brands to focus on exceling at a portion of the stack, and adding value organically or through acquisitions.

Guidance

Data science development platforms are useful to the enterprise if – and when – it needs to own part of its destiny tied to data science, needs to execute a core part of its business strategy using data science, or needs to own industry breakthroughs that will likely be years away. The other reasons may include forward or reverse integration of critical value chains, involving data science across these.

IT Leaders: Use development tools for the critical, must have, applications that learn from data. Otherwise, look to ISV Cloud application subscriptions that can deliver 90 percent of the business use case with minor customizations. Expect evolving change in the marketplace of data science, from its techniques and tools, to people and processes, and from its uses and value as this new science that uncovers secrets hiding in silos of data, finds business insights, anticipates future outcomes, and recommends choices makes its way through the organization from the boardroom through operations.

IT providers: Run and do not walk, to the Cloud for anything data science. Add value where you have a foothold, but be aware that the next great market movement will be through business analysts and users. Any IT provider seeking to dominate the market will want to own this constituency. This means the visualization portion of the stack aimed at non-programmers / business analysts / business users is now up for grabs, and is a market lever that can crown a successor to spreadsheets but for the era of digital business.

meet the team