data sets

← Back to Blog

True Self-Service Reporting

Self-service reporting has long been a goal of the IT departments I consult with. They long for the ability to pass along tasks involving query writing and data retrieval to the end-user. Basically, they want their user base to “write their own reports.” There are several solutions on the market today that attempt to provide such relief.

Most BI solutions or reporting tools within enterprises are connected to a database which users can query. To make this work for the user base, the IT department creates a modeling layer which represents certain subsets of tables and subsets of columns within those tables. In the most basic use cases, IT sets up security credentials for users and then the end-user logs in, runs an SQL query, and gets back a grid of data to work on.

This process seems straightforward: ask the tool for the data you want, and you get it back. However, the data modeling has to be clear enough to the users so they know which columns in the model represent which data. The modeling scheme can be simplified by making field names more understandable. That’s fine for a simple “Person” table containing trivial columns (first name, last name, city, state) and the query is “People that live in PA.” The user just clicks on “state”, selects equals, types in “PA”, selects the output columns, and runs the report. Some tools handle self-service reporting in this simplistic way. Even more sophisticated tools that offer visualization and other reporting options still rely on the end-user for knowing where the data is and how it’s structured.

Running More Complex Queries

Once the query gets more complex, the user is going to require more knowledge, or “domain knowledge”. Suppose the query is for specific types of transactions run by specific types of persons – let’s say sales transactions run by sales reps who also have person records. This query hops among three tables, and the hops themselves may be indirect. We may not be hopping from “Sales Transactions” to “Sales Reps” to “Person.” The query may go from “Transactions” to “Sales Rep Transaction Middle Table” to “Sales Rep” to “Sales Rep Middle Table” to “Person.” For their report to be correct, users will need to know that they’ve got to make these hops. Although this is a rather trivial situation for a database query, the user will need a certain amount of domain knowledge to create the report.

Accessing a broader set of data is going to be intimidating for a lot of users. Fortunately, the ideal self-service reporting solutions remove the requirement of knowing how tables and models interact. These solutions simply enable you to say, ‘I want to find out everyone from PA’. So, the question is, how can you empower your users with a truly self-service solution that makes it easy for them to do all sorts of analysis and reporting?

Entrinsik offers an answer. With Entrinsik’s Informer you can create Informer Datasets — a curated, single-source of truth that enables you to bring in data from each of your different Data Sources. A data domain expert would have authorization to create a very large Dataset that might include transactions, sales rep names, sales rep addresses, last year’s sales, person information, etc. The Dataset could easily have fifty columns and millions of rows.

Informer provides for very easy filtering on this data by showing you what columns are filtered to support your reporting and analytics need. For example, you can easily look at a column heading and say, ‘this state column is what I want to filter on to create a report’. The Dataset will quickly provide a list of states and you simply select the states you want. You don’t even know that the process involved hopping over several tables, or perhaps gathering data from an entirely different database, or even a calculation derived from other columns. All you know is the data is there.

The Open Query Challenge

Among the challenges associated with traditional self-service reporting is the open query problem. When people are permitted to run self-service database queries, they are often inclined to choose the columns they want first. Then they run a wide open, unfiltered database query – maybe during the day when production and sales transactions are going on – and the system pulls out an enormous amount of data. They scan the first page and then iterate over this until they figure out how to get the query to shrink the set of data to the scope they want. Often these queries are huge, and your database gets hit repeatedly, which results in the system slowing down due to a lot of unnecessary CPU activity on the database side. That doesn’t need to happen if the data can be offloaded into a staging solution.

As an IT strategy, data warehousing solved this issue by redirecting people away from hitting the production database. With this strategy, either a manual or automatic infrastructure extracts data from the production database and copies it onto another database which individuals can access for reporting purposes. However, the skillsets necessary to design the ETL processes to populate the data warehouse are typically rare. In many cases the users with the domain knowledge of the production database need to collaborate with IT staff in order to create these warehouses. In addition, giving individual network users access to several data marts or warehouses creates a distributed security problem. For example, controlling “who can access what” that’s stored in a distributed way across the data landscape.

Informer’s approach uses an Informer Dataset as a barrier between the user and the production database. With Datasets, designers don’t need to know sophisticated ETL techniques, and users don’t need to know a lot to produce sophisticated reporting. You won’t run the risk of users interfering with the production database because Informer handles that load for them. Since these users are acting on the Datasets only, Informer acts as a buffer between the user and data source. And Informer’s security model ensures “who can access what” is located centrally.

Reporting Consistency

Another issue with self-service reporting is consistency among analytics. Suppose Leif runs his report for “Sales figures for PA sales reps,” and Isabel runs her “PA sales by salesperson” report. Often, their row counts, and subsequent aggregations (like “total sales”), will differ. This can be a problem, no matter whether they are given access to report directly against the database or against the data warehouse. Informer solves this with its Dataset. Once the concept of “sales transactions” can be defined, and potentially complicated filtering is agreed upon (e.g. “Exclude intra-division transactions”), Leif and Isabel can use this Dataset to compile their reports.

We have said that a data domain expert can author an Informer Dataset. Where does the data domain expert come from? Typically, larger organizations employ data domain experts — people who understand the data source schema inside and out. These individuals can construct broad Datasets (for example “Sales Transactions”) that maybe used for self-service reporting. However, even at smaller organizations, lots of people utilize database queries or Excel spreadsheets in their work. And typically, there are users who compile the data for others to use. In a scenario like “Bob always uses this spreadsheet that he reviews and has final say on, and then he distributes it for everyone to use” — clearly the person to be creating the Informer Dataset is Bob. Instead of Bob cutting, pasting, positioning and duplicating values from one data source into the spreadsheets, he will have a much easier time designing an Informer Dataset. Moreover, that Dataset can be secured in Informer so only the team of users that need it can see it. This avoids security concerns of passing around unsecured spreadsheets.

One interesting Entrinsik Informer use case comes from a higher education customer. Management wanted to compare, per cohort, the number of students that were present for their second fall semester, second spring semester, their third, etc. This was handled by a manager working with multiple Excel spreadsheets. At a certain time during each semester she saved these spreadsheets as a snapshot in time for reporting purposes. Then, she looked at each of these spreadsheets for each student to find their status at that time and added the information into certain cells in a different spreadsheet. It took her weeks to do this, but she needed to accomplish this because school management wanted to analyze the information.

Informer was able to help her by uploading the spreadsheets into separate Informer Datasets. Then, a summary Dataset was created that linked to each one of the individual Datasets. All she needs to do is simply upload the new census data and other data she gathers into the separate Datasets and then Informer creates a summary Dataset for her. She doesn’t have to do any manual work. She just runs the Dataset and it all shows up for her. Because she had a lot of experience working with this information, she knew what data she needed, and she knew how to get to the data. However, she didn’t know any way to automate it. Now she owns that Dataset for everyone to use, and they do a lot more analysis now than just the one thing they were doing before.

Data Latency Concerns

What about data latency? What’s an acceptable amount of data latency between the production database and the reporting database for your users? How do you solve for when the sales folks say, for example, they need up-to-date information now and they can’t wait for the reporting database to be updated overnight? How much transpired time is acceptable for them to pose questions on the current data and still have valid answers based on when the query ran — one hour, six hours, one time a day? The answer depends on a case-by-case basis but fortunately Informer’s Dataset can refresh on a schedule that you establish based on what is convenient for your operations. So, if users need to have the data refreshed each hour, Informer can do that.

Keep in mind that when you’re working with trends, you don’t need 100% up-to-the-minute transactions. That’s not what trends are about. They are about broad strokes. However, in the case where someone does need to know the latest information “right now.” Informer has an ad hoc query facility. That doesn’t solve the problem regarding whether the user knows how to run a query, so only certain security roles can run these.

A Solution That Works for Everyone

An ideal solution provides for two types of updates: refresh only the information that has changed since the last scheduled update and update all records. Informer supports both. Datasets can also be maintained via Informer’s REST API. Rows can be pushed into the Dataset, or the Dataset may be refreshed upon a certain event occurring.

With Entrinsik Informer Datasets, you can provide true self-service reporting to all of your employees and enable them to find new and more productive ways to improve operations and achieve your goals.

For more please contact sales@entrinsik.com or call 888-703-0016.

This article was written by Andrew Morovati
Informer Chief Solutions Architect

Read More
← Back to Blog

Two Big Differentiators for Entrinsik Informer

Although there are many differentiators between Informer and other BI applications, here are two important benefits that customers recently mentioned to me that they really like and that set Informer apart:

  1. The scheduler. Informer provides a very flexible scheduler that we call Informer Jobs. Most applications have a datastore/ warehouse that backs the BI applications. This includes a huge data model with lots of tables containing information that is refreshed in bulk at a particular point in time – typically overnight. But, if there is one particular component of that datastore/ warehouse that you need refreshed more frequently, you’re stuck, or you request IT to go through a rigorous manual process to update the information.
    If you have a data warehouse and you want a sales component to be refreshed, your IT team has to go in and manually do that – it’s all or nothing – they can’t go in and schedule just one piece of data. Whereas with Informer Datasets, each Dataset can have its own schedule for refreshing. You can define a Sales Dataset that is refreshed on one schedule, an Inventory Dataset that is refreshed on a separate schedule, and the other Datasets can have their own independent schedule. So, if your sales data is changing frequently and you want to see that reflected when you’re reviewing your data in Informer, you can set Jobs to refresh that overnight, or when you want.

  2. Speed and Efficiency. Organizations are dealing with larger and larger amounts of data coming in from many different sources and they need to compile it into indexing systems that enable users to quickly analyze it and work with it.
    Informer provides tremendous efficiency in working with data. For example, once the data is loaded in the Dataset, users can very quickly work with it – groupings, aggregations, visuals, analytics, etc. – because we index everything within the Dataset. This is a significant benefit because it’s very fast and efficient for you, enabling you to accomplish a lot more in a shorter period of time.
    Informer can also handle multiple million row Datasets and still be able to group and aggregate within a few seconds without having to wait for an extended period for that process to happen. Informer accomplishes this by utilizing a relatively new database technology which was born out of search engines and from Google’s database requirements that enables people to sort through terabytes of data to find what they are looking for.
    From a BI applications standpoint, usually higher-end and more expensive applications incorporate this technology. However, as a forerunner among BI applications in the SMB space, Informer utilizes the same technology.

For more please contact sales@entrinsik.com or call 888-703-0016.

Read More