Madhavi Chandra, Director of Informer Product Management, shares how Informer’s Dataset architecture, particularly features in Elasticsearch, can assist in managing Big Data.
When data is abundant, finding actionable insights can be like finding a needle in a haystack. The key to extracting insights from big data is preparing the Dataset in a meaningful manner, and then being able to interact with it efficiently. If it is cumbersome or time-consuming to obtain results, it quickly diminishes the value of the Dataset.
For many of our customers, it is common to have Datasets with 10s or 100s of millions of rows and hundreds of columns wide. It is essential that their experience interacting with such large volumes of data is fast, efficient, accurate, and provides them the business insight they need to elevate their organization.
With Informer, you can curate as much data as you desire in the Dataset and then pare down views simply by filtering. Filtering eliminates the need for multiple Datasets and supports a single source of truth that contains the entire data picture. There is no need to query your productional database each time you interact with the Dataset; in fact, users with appropriate permissions may refresh the Dataset on a schedule that is optimal for organizational needs or may refresh the Dataset on demand. In Informer, all column field choosers are efficient, filterable views allowing you to quickly navigate/find a field and focus the view by choosing the fields you want to see at the moment. You could reduce a 200-field Dataset to five necessary columns to address a particular task. Then, rinse and repeat with any set of columns that are most relevant to your task. Our intuitive and efficient column selection makes interacting with large and wide Datasets and Reports seamless, lending to fast performance when rendering in the UI.
Data inherently grows over time. To make it both flexible and efficient when refreshing a Dataset, Informer provides different options that innately support large Datasets. You can choose to replace all records, only add new records, or add new records and update existing records that have been changed. Refreshing a multi-million row Dataset is thus painless and fast!
Informer provides multiple methods for augmenting and preparing data to support large Datasets. Flow steps are actions that are taken on Query results after a query has been run, and before the Dataset is indexed. They allow you to create new fields and modify existing fields based on results from the Query or user input. The data is then indexed for interaction and retrieval. Elasticsearch Script Fields on the other hand, allow you to add new fields to your Dataset that are evaluated inside Elasticsearch each time the field is referenced. What does that really mean? You can augment your Dataset without requiring re-indexing! Think of it as a post-Elasticsearch calculated column Flow Step, providing huge performance gains for large Datasets due to fast execution of script fields by eliminating the indexing. Elasticsearch Script Fields have many use cases and specifically make it easy to compute fields that are ‘As of’ a specific date/time, e.g., computing an Age field, Late field, or Days Since field. Any such real time Elasticsearch Script Fields are computed quickly when you invoke the field without needing to refresh the Dataset leading to big time savings!
We have designed Informer so that our customers can have efficient, effective, and intuitive interactions when working with their data, no matter how large the Dataset.