How Hybrid Analytics Improves Real-Time Data-Driven Insights – The New Stack
Mukund Sreenivasan
Mukund is a senior sales engineer at Couchbase. Prior to that, he held Solution Architect positions at Acoustic, Automation Anywhere, Tealium, Clicktale and others.
Hungry for a pizza as you compose a message for your newly found band on a business website from your mobile phone when the phone rings with a Happy Hour coupon for your favorite pizza on your next order.
Why was the coupon for your most ordered pizza presented to you around 3:00 p.m.? And before that, how did you even find out that this group existed on your professional business networking site? And why did the coupon arrive as a push message and not an email?
Many of these nudges are the result of algorithms that took data scientists months or even quarters to refine, and then just as long to roll out to their client audiences.
Customer experience has quickly become the most important competitive differentiator and ushered the business world into an era of monumental change. As part of this revolution, companies are interacting digitally – not only with their customers, but also with their employees, partners, suppliers and even their products – on an unprecedented scale.
These interactions, which create mountains of data, are powered by the Internet and other 21st century technologies — and at the heart of the revolution are the cloud, mobile, social media, big data and IoT applications of a business.
There is no doubt that enterprises are recognizing the value of data from emerging applications. And they’re making strategic decisions to embrace technologies that allow them to glean insights from growing amounts and types of data to offer customers a level of personalization we’ve never seen before.
However, the aspiration to be more data-driven is easier said than done and not only involves the adoption of new technologies that enable the use of machine learning models against real-time operational data. , but also new approaches to data management.
NoSQL databases offer opportunities but introduce new challenges
To adapt to this new business landscape, organizations need to identify database technology that meets business needs. One such solution is a NoSQL document instead of the traditional rows and columns used by relational databases. NoSQL means “not just SQL” rather than “no SQL” at all.
This means that a NoSQL JSON database can store and retrieve data using the ANSI SQL you are familiar with, combined with the flexibility of JSON. It’s the best of both worlds. Therefore, NoSQL databases are designed to be high performance, flexible, scalable, and able to quickly respond to the data management demands of modern enterprises.
Developers and customers face multiple modern data and analytics challenges when working with JSON data. In an environment of increasing volumes and variety of data, developers are asking developers to make data available in real time for further analysis. Traditional data warehouses require schema changes and ETL (extract, transform, and load) pipeline changes when application data changes.
This prevents analysts from quickly gaining insights into what’s happening in their business, hampering their ability to monitor and optimize data and execute in a way that doesn’t interfere with their business workloads. operational systems.
Additionally, there is a growing need for complex ad-hoc analytical calculations and queries with large joins, aggregations, and groupings. In this context, DevOps teams need to be able to offload data processing requests from underlying transactional data services to analytical data processing engines that support parallel query processing and bulk data management.
Treat data pipeline indigestion with hybrid analytics processing
As the underlying data grows, it is necessary to manage various data streams from one system to another, which increases the number of data channels needed to perform ETL processing. This often involves pipelines to transform operational data before analytical tools can process it.
And, of course, all of this leads to increased time, cost, more systems and processes to maintain, and adds even more complexity to the overall data platform and infrastructure. In other words, it can lead to a serious case of data pipeline indigestion!
The traditional paradigm of thousands of users with known structures was and still is well served by the traditional relational database structure of rows and columns. But that doesn’t meet the need for higher-volume, pattern-free throughput and agility demanded by web applications. NoSQL database technology has evolved from these use cases and supports millions of transactions per second across millions of users.
Analytics at the speed of transactions
A hybrid operational and analytical processing (HOAP) architecture “breaks the wall” between transaction processing and analytics, enabling more informed, “real-time” decision-making.
In fact, a recent academic paper presented at the IEEE International Conference on Big Data stated that “database system architectures with support for hybrid data management – known as HTAP (transactional processing/ Hybrid Analytics) or HOAP (Hybrid Business Processing/Analytics) – are emerging and gaining more and more traction in the commercial and research sectors.
The generally accepted practice for managing both transactional/operational workloads and analytical workloads is to separate them, with each workload running in a separate system. The fact that one process can get in the way of the other—that long-running analytical queries affect incoming transactions, for example—is just one of the many reasons why it makes sense to separate these two workloads. Hybrid analytics combines both operational transactions and analytics into a single system or platform and eliminates ETL delays. Once again: Analytics at the speed of transactions!
Optimize real-time analytics with a HOAP data platform
Part of the appeal of complete HOAP systems, or systems capable of hybrid operational and analytical processing that includes both OLTP (transactions) and OLAP (analytics) in a single implementation, is more than just an effective strategy fewer systems to maintain; it’s also the ability to perform analytics on incoming operational transactions in near real-time and even use your ML algorithms as part of your queries. Vendors that provide these features include Couchbase Server, Microsoft SQL Server, and SAP HANA — the former being for NoSQL databases and the latter two for structured (relational) databases.
Now let’s take a look at how a HOAP data platform can give businesses the ability to quickly, reliably and efficiently run near real-time analytics on operational data without the need for ETL – and avoid the risks potential for data leakage during data transfer. as well as wasted time – to deliver the right nudges to help their customers uncover opportunities, or thwart unwanted risks, and improve their overall experience.
Much of the data requiring analysis by data scientists today is in JSON format, so the data scientist’s job requires them to convert JSON data from its document format to a format that works for their OLAP tables (rows and columns), which takes time and technology to be able to use the data to develop ML algorithms. A hybrid NoSQL database can combine both JSON operational data and all the benefits it offers, with the ability to run analytics on the data using a structured query without incurring the time wasted for extract, transform and finally load the data to start the search. .
The traditional model for training an ML algorithm is unchanged; users can import and operationalize an ML model as a callable function as part of a query. This is unique because it helps the data scientist not only improve and test an algorithm by exploring it on queried subsets of data, but also deploy it for use on the near real-time operational data of the analytical side of the HOAP System.
There are many use cases that can benefit from the high transactional performance of a NoSQL database, capable of executing structured queries that can then be exposed to other queries.
In the retail industry, for example, the ability to offer the right product recommendation or coupon to a customer as they progress through the checkout process has been shown to increase sales. Systems today tend to use static algorithms, but by replacing them with a Python ML model that can then be applied to the latest operational data, businesses can improve efficiency and further increase sales based on insights. data-based.
In all fraud-prone industries, including financial services, retail, and travel, identifying anomalies in user behaviors and indicators of compromise is critical. However, looking at thousands or millions of transactions per second with a structured query is unrealistic, especially if you want to act in less than seconds. A HOAP platform capable of using more insightful ML algorithms can help organizations analyze transactions quickly and at scale to more effectively identify fraud.
Another use case we’re seeing more and more is for banks and financial institutions that need to calculate risk scores, which requires organizations to perform complex analytical queries, calculations, and aggregations on data. JSON, enriched with third-party data.
Couchbase Server optimizes crawl pipelines
Large enterprises use Couchbase to solve the above use cases and many more by leveraging Couchbase Server’s Analytics service, a hybrid operational/analytical processing database. This helps modern businesses increase revenue, reduce risk profiles, and improve operational efficiency.
Additionally, customers have appreciated the benefits of isolating their operational and analytical workloads within the same data platform instance – avoiding ETL, performing analytics on the same data model as the application, avoiding performance interference and thus avoiding data pipeline indigestion. This is the key to providing real-time insights into real-time operational data.
The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Real.
Image by Benjamin Hartwich from Pixabay