Self-service reporting has long been a goal of the IT departments I consult with. They long for the ability to pass along tasks involving query writing and data retrieval to the end-user. Basically, they want their user base to “write their own reports.” There are several solutions on the market today that attempt to provide such relief.
Most BI solutions or reporting tools within enterprises are connected to a database which users can query. To make this work for the user base, the IT department creates a modeling layer which represents certain subsets of tables and subsets of columns within those tables. In the most basic use cases, IT sets up security credentials for users and then the end-user logs in, runs an SQL query, and gets back a grid of data to work on.
This process seems straightforward: ask the tool for the data you want, and you get it back. However, the data modeling has to be clear enough to the users so they know which columns in the model represent which data. The modeling scheme can be simplified by making field names more understandable. That’s fine for a simple “Person” table containing trivial columns (first name, last name, city, state) and the query is “People that live in PA.” The user just clicks on “state”, selects equals, types in “PA”, selects the output columns, and runs the report. Some tools handle self-service reporting in this simplistic way. Even more sophisticated tools that offer visualization and other reporting options still rely on the end-user for knowing where the data is and how it’s structured.
Running More Complex Queries
Once the query gets more complex, the user is going to require more knowledge, or “domain knowledge”. Suppose the query is for specific types of transactions run by specific types of persons – let’s say sales transactions run by sales reps who also have person records. This query hops among three tables, and the hops themselves may be indirect. We may not be hopping from “Sales Transactions” to “Sales Reps” to “Person.” The query may go from “Transactions” to “Sales Rep Transaction Middle Table” to “Sales Rep” to “Sales Rep Middle Table” to “Person.” For their report to be correct, users will need to know that they’ve got to make these hops. Although this is a rather trivial situation for a database query, the user will need a certain amount of domain knowledge to create the report.
Accessing a broader set of data is going to be intimidating for a lot of users. Fortunately, the ideal self-service reporting solutions remove the requirement of knowing how tables and models interact. These solutions simply enable you to say, ‘I want to find out everyone from PA’. So, the question is, how can you empower your users with a truly self-service solution that makes it easy for them to do all sorts of analysis and reporting?
Entrinsik offers an answer. With Entrinsik’s Informer you can create Informer Datasets — a curated, single-source of truth that enables you to bring in data from each of your different Data Sources. A data domain expert would have authorization to create a very large Dataset that might include transactions, sales rep names, sales rep addresses, last year’s sales, person information, etc. The Dataset could easily have fifty columns and millions of rows.
Informer provides for very easy filtering on this data by showing you what columns are filtered to support your reporting and analytics need. For example, you can easily look at a column heading and say, ‘this state column is what I want to filter on to create a report’. The Dataset will quickly provide a list of states and you simply select the states you want. You don’t even know that the process involved hopping over several tables, or perhaps gathering data from an entirely different database, or even a calculation derived from other columns. All you know is the data is there.
The Open Query Challenge
Among the challenges associated with traditional self-service reporting is the open query problem. When people are permitted to run self-service database queries, they are often inclined to choose the columns they want first. Then they run a wide open, unfiltered database query – maybe during the day when production and sales transactions are going on – and the system pulls out an enormous amount of data. They scan the first page and then iterate over this until they figure out how to get the query to shrink the set of data to the scope they want. Often these queries are huge, and your database gets hit repeatedly, which results in the system slowing down due to a lot of unnecessary CPU activity on the database side. That doesn’t need to happen if the data can be offloaded into a staging solution.
As an IT strategy, data warehousing solved this issue by redirecting people away from hitting the production database. With this strategy, either a manual or automatic infrastructure extracts data from the production database and copies it onto another database which individuals can access for reporting purposes. However, the skillsets necessary to design the ETL processes to populate the data warehouse are typically rare. In many cases the users with the domain knowledge of the production database need to collaborate with IT staff in order to create these warehouses. In addition, giving individual network users access to several data marts or warehouses creates a distributed security problem. For example, controlling “who can access what” that’s stored in a distributed way across the data landscape.
Informer’s approach uses an Informer Dataset as a barrier between the user and the production database. With Datasets, designers don’t need to know sophisticated ETL techniques, and users don’t need to know a lot to produce sophisticated reporting. You won’t run the risk of users interfering with the production database because Informer handles that load for them. Since these users are acting on the Datasets only, Informer acts as a buffer between the user and data source. And Informer’s security model ensures “who can access what” is located centrally.
Another issue with self-service reporting is consistency among analytics. Suppose Leif runs his report for “Sales figures for PA sales reps,” and Isabel runs her “PA sales by salesperson” report. Often, their row counts, and subsequent aggregations (like “total sales”), will differ. This can be a problem, no matter whether they are given access to report directly against the database or against the data warehouse. Informer solves this with its Dataset. Once the concept of “sales transactions” can be defined, and potentially complicated filtering is agreed upon (e.g. “Exclude intra-division transactions”), Leif and Isabel can use this Dataset to compile their reports.
We have said that a data domain expert can author an Informer Dataset. Where does the data domain expert come from? Typically, larger organizations employ data domain experts — people who understand the data source schema inside and out. These individuals can construct broad Datasets (for example “Sales Transactions”) that maybe used for self-service reporting. However, even at smaller organizations, lots of people utilize database queries or Excel spreadsheets in their work. And typically, there are users who compile the data for others to use. In a scenario like “Bob always uses this spreadsheet that he reviews and has final say on, and then he distributes it for everyone to use” — clearly the person to be creating the Informer Dataset is Bob. Instead of Bob cutting, pasting, positioning and duplicating values from one data source into the spreadsheets, he will have a much easier time designing an Informer Dataset. Moreover, that Dataset can be secured in Informer so only the team of users that need it can see it. This avoids security concerns of passing around unsecured spreadsheets.
One interesting Entrinsik Informer use case comes from a higher education customer. Management wanted to compare, per cohort, the number of students that were present for their second fall semester, second spring semester, their third, etc. This was handled by a manager working with multiple Excel spreadsheets. At a certain time during each semester she saved these spreadsheets as a snapshot in time for reporting purposes. Then, she looked at each of these spreadsheets for each student to find their status at that time and added the information into certain cells in a different spreadsheet. It took her weeks to do this, but she needed to accomplish this because school management wanted to analyze the information.
Informer was able to help her by uploading the spreadsheets into separate Informer Datasets. Then, a summary Dataset was created that linked to each one of the individual Datasets. All she needs to do is simply upload the new census data and other data she gathers into the separate Datasets and then Informer creates a summary Dataset for her. She doesn’t have to do any manual work. She just runs the Dataset and it all shows up for her. Because she had a lot of experience working with this information, she knew what data she needed, and she knew how to get to the data. However, she didn’t know any way to automate it. Now she owns that Dataset for everyone to use, and they do a lot more analysis now than just the one thing they were doing before.
Data Latency Concerns
What about data latency? What’s an acceptable amount of data latency between the production database and the reporting database for your users? How do you solve for when the sales folks say, for example, they need up-to-date information now and they can’t wait for the reporting database to be updated overnight? How much transpired time is acceptable for them to pose questions on the current data and still have valid answers based on when the query ran — one hour, six hours, one time a day? The answer depends on a case-by-case basis but fortunately Informer’s Dataset can refresh on a schedule that you establish based on what is convenient for your operations. So, if users need to have the data refreshed each hour, Informer can do that.
Keep in mind that when you’re working with trends, you don’t need 100% up-to-the-minute transactions. That’s not what trends are about. They are about broad strokes. However, in the case where someone does need to know the latest information “right now.” Informer has an ad hoc query facility. That doesn’t solve the problem regarding whether the user knows how to run a query, so only certain security roles can run these.
A Solution That Works for Everyone
An ideal solution provides for two types of updates: refresh only the information that has changed since the last scheduled update and update all records. Informer supports both. Datasets can also be maintained via Informer’s REST API. Rows can be pushed into the Dataset, or the Dataset may be refreshed upon a certain event occurring.
With Entrinsik Informer Datasets, you can provide true self-service reporting to all of your employees and enable them to find new and more productive ways to improve operations and achieve your goals.
For more please contact firstname.lastname@example.org or call 888-703-0016.
This article was written by Andrew Morovati
Informer Chief Solutions Architect