data analytics

← Back to Blog

Partnering with the IT organization

Many prospective customers say creating complex reports is challenging. What challenges them the most is getting data out of their systems and asking the IT staff to write the necessary queries to pull a report or create post-query filters and visuals. If the report isn’t quite right, they need to go back to IT and submit another request.

For companies with only one person who knows the database language and who can code, the dependency on that person can cause roadblocks to productivity. Unfortunately for end users, IT staff is often busy juggling many projects, so it can take a while to get a report.

IT departments have a legitimate concern with users accessing data because reports can easily become inconsistent with each other when they are created by different individuals within the organization. For organizations with an IT staff whose policy is to control the data, Informer delivers the perfect solution by enabling IT to create connections to all their different data warehouses while maintaining data governance and security over who has access to the data by using Informer Datasets.

Since the data within Datasets has been curated through the data governance process, a huge burden can be lifted from the IT staff whereby they can allocate more time to focus on critical projects. By designating a few technical people to be data owners within their respective departments, IT managers find it easier to offload more reporting responsibility to users. And, with Dataset access being granted by IT, users can create self-service reports that meet their needs.

To learn more about the functionality mentioned here or any other Informer functionality, please contact Informer Support with questions or to schedule training.

Read More
← Back to Blog

Informer 5 delivers greater data access and extensibility

Over its existence, Informer has gone through major evolutionary changes in the areas of data access and management, data governance, and reporting. Informer 3 and Informer 4 did a great job in setting up mappings that allowed users to directly query a production database and pull reports. These earlier versions of Informer sandboxed portions of the system and fields that end users were allowed to see, and they enabled organizations to set up rules for tables that users could query.

However, Informer 3 and 4 faltered by allowing end users to create queries against the production database and pull too much data. These versions also allowed users to query tables in a way that was inefficient and slowed down the system. For example, selecting records within a multi-million row table that contained certain criteria that was not indexed.

Informer 5 Data Access

With Informer 5, we still allow for ad hoc queries against the database, but we’ve taken a huge leap forward by creating Dataset functionality. Because Datasets can be scheduled to automatically refresh on any given frequency, Admins don’t have to worry about users running queries that take all day. A great benefit in using Datasets is they are always going to be fast because they’ve already been indexed.

With Dataset capability, a DBA-type person can create a very large set of data and compartmentalize it with separate views of information that their end users can work with in a very simple and secure way. This eliminates the need to create separate sets of data for each person. This also enables you to establish strong data governance by creating a single Dataset based on your rules which include limiting users to only see portions of the data depending on their role in the organization.

Data access and end-user security are challenges faced by many organizations. However, with the Dataset’s user-based filters, Informer 5 eliminates these challenges by giving you the control and strong data governance you want. For example, when giving access to users, you can set up a filter that restricts the rows that can be seen to only a subset of that data. You can allow users to get into the data and look around, aggregate, make charts and more without worrying that they’re somehow going to backdoor their way into information from another department. Secondary-level access to a Dataset can also be enabled for a class of users who don’t even create reports, they just access data views you’ve set up.

Informer 5 Extensibility and Integration Points

A lot of the restrictions and limitations that are found in enterprise software applications are eliminated when software is designed using:

  • extensibility as a priority
  • and, an open source technology stack that provides an API structure which offers easy customization and expansion.

Extensibility, defined as the ability to easily make changes in system functionality, is enabled in a safe way within Informer 5 because of the large variety of hooks we programmed on the server-end and the user interface. Other BI companies have probably not gone to the extent we have with hooks in Informer.

Extensibility and integration points might not be at the forefront of buyers’ thinking when searching for a BI tool to purchase directly from a software provider, however, when organizations meet with VARs and integrators to discuss their requirements for a BI solution, extensibility and flexibility are often highlighted as important criteria.

Examples of Informer 5’s extensibility and integration points include:

  • You can update records with different information as they move through the system by using Informer 5 Flow extensions which incorporate company-specific ETL logic.
  • Data can be pushed into the system via Informer 5’s API. If you have data that you want to track and analyze that’s coming from some other location, you can use our REST API to pull it in and track it. One example is IoT, where connected devices, like sensors, are continuously collecting and streaming information. In this case, you’re collecting data on the fly and you’re not refreshing a Dataset but instead pushing data into it. An adapter can be written that takes that data and channels it into an Informer Dataset dedicated to collecting the sensor data. Then, we can write some code that uses our API to publish the information directly into Informer.
  • Informer 5 provides integration touchpoints designed for local deployments. For example, integrate Informer with your LDAP or Active Directory security to do single sign-on.
  • Add custom visuals to your dashboards to achieve greater insights. For example, we quickly deployed stadium mapping visuals for a customer, where the arena maps were bound to data pulled from their database. We were able to achieve that in a short amount of time because of Informer 5’s architecture.
  • Informer 5 provides collaboration integration that enables you to develop valuable content within Informer and share it across your organization. You can link to reports, visuals, images, and more.
  • From the perspective of Entrinsik’s business partners, VARs, and integrators, Informer 5’s extensibility provides many ways to adapt the system within different customer environments. Informer’s user interface and backend can be rebranded and folded into another application and distributed and licensed that way. Informer’s licensing model is also flexible and compatible with how our partners license their app.

In summary, when conducting research into a new Business Intelligence and Data Analytics solution, your organization will benefit tremendously when you select those systems that include:

  • indexed Datasets with granular control provided by user-based filters
  • extensibility that offers the flexibility and integration you will always want as you look to do more and more with the system.

To learn more about the functionality mentioned here or any other Informer functionality, please contact Informer Support with questions or to schedule training.

Read More
← Back to Blog

Functionality That Differentiates Informer 5

Entrinsik’s data analytics and BI platform, Informer, radically simplifies the process of accessing, cleansing, blending, and analyzing disparate data by using a single platform to create a cohesive, curated, governed data hub for self-service data analysis across organizations. Informer facilitates self-service reporting using Informer Datasets and Filtering. Informer minimizes self-service reporting errors because you are starting from a common baseline of curated data – a Dataset within Informer. The platform also facilitates collaboration which enables people to comment on reports, share insights, and more. There are several areas of functionality that our customers find highly beneficial and that differentiate Informer from other self-service reporting solutions.

  • Saved Filters. If you can look at a Dataset and write a filter – which you can do easily with Informer – you can save that filter to use when creating a report. The filter you create for the Dataset represents important criteria you’ve established for pulling in the data you want within your Dataset. Maybe you need “current transactions for today” or “transactions of employees that live in a certain state.” Such filters are available to all users that have access to the dataset. You can define very complicated filters for users and they simply add them with a couple of clicks.
  • Informer Discover. Informer’s Discover feature is ideal for self-service reporting. Often, users have a specific question to ask of their data. They log on to the reporting tool, get the answer, and they are done. However much of the time, users don’t have a finite question to ask; they want to see significant trends or groupings in the data but may not know where to start looking. Discover solves that by enabling you to click on specific columns like “state” and “order amount,” and Informer immediately displays charts relevant to those aggregations. Users then have an immediate visualization to interact with to help tell if those aggregations are significant.
     
    For example, let’s say you click on a state and get a distribution for all the orders for the last three days, and the numbers are the same per state. Maybe the geographic distribution for those orders is not significant, but maybe in your business plan you’re spending marketing dollars evenly across certain states per capita. You see sales are a lot better in one of those states and click on that state and see the numbers are skewed a lot more there. Well, you’ve just piqued your curiosity and start thinking this is something to investigate. You may even find it worthy of a report, all by simply clicking the button and immediately viewing the trend. People may not necessarily know what questions to ask when conducting data analysis. That’s not unusual with self-service reporting. However, the answers they start seeing with Discover may stimulate additional questions to ask.
     
    If you click on sales people in Discover, you may notice their sales are spiking up. From there you can further aggregate, or drill down into the data, even map the orders. Discover makes it easy for you because you can simply check a box and instantly see those aggregations.
  • Informer Collaboration. Another differentiator is Informer’s collaboration functionality. Maybe you want people to view your data, but you don’t want everyone looking at it because the information might not be ready for everyone to review. With Informer’s sharing capability, you can do that. For example, I may be a member within an Informer Team but I may not know anything about a particular Dataset. Maybe I started clicking some of the check boxes in Discover and think, “Wow, Bob’s sales are through the roof so I’m going to create an Informer report and keep it within my Team. I’m also going to send a comment to the Team and say, ‘this needs further exploring, but my database expertise isn’t as good as some of you. Can you look at this with me and see if there’s a way to tweak this to provide an answer for why Bob is doing so good?’”.
  • Single Source of Truth. Given Entrinsik’s many years of designing and supporting BI and reporting software, we found that self-service reporting is both a blessing and a curse. For example, someone might start thinking they’re getting solid answers on a database query they’ve written to answer a certain question. Other people might be running different queries to try to get the answer to the same question. Ironically, quite often the answers differ. Who is right? Maybe the reason the results differ is because in one person’s query, practice orders and refunds are eliminated from the set of current orders; one user knows about those nuances, whereas another might not. Having these two sets of numbers out there, and carrying equal weight, can lead to inconsistencies in the overall analysis of company data.
     
    Informer solves this problem by putting the power in your hands to curate data, so you can make sure people see what they need to see. When your Informer Dataset is curated by someone who understands its value of providing a single source of truth, you eliminate a lot of situations like, “My numbers didn’t come out the same as Bob’s numbers; why is that? I just spent countless hours trying to figure out why that is.”  Unproductive time and money wasted on people spending time trying to reconcile different answers is eliminated with Informer’s curated Datasets. With Informer Datasets, you’ve now got a single source of truth where you can say, “I’m going to make a Dataset of current orders and this is it, and anyone who wants to report on orders uses this Dataset”. That delivers ROI to your organization.
     
    Another reason a curated Dataset is preferred is that no database table that I’ve ever seen has perfect data in it.  There are always transactions that are entered wrong and there are always rows that don’t belong — they still exist because in the past, we ignored them. People will also tell you they take data from a reporting tool, dump it into Excel, and within Excel, they eliminate the rows that are wrong. There is a better way!  The Informer Dataset can be “that Excel place” where you toss out the junk. One of the common uses of Informer is for people to find bad data in a database and cleaning it up.
     
    Here’s a hypothetical example:  I’m the one who knows how to create Informer Datasets in our organization and I’m going to create a Dataset based on how people are using Excel spreadsheets. I talk to people and we all agree this is what the current orders and corresponding Dataset needs to be. So, I write my query for current orders and I’ve also been told by Joan in Accounting “You have to eliminate all the orders with ‘Type X’ because those aren’t real orders”. So, I add within my Dataset the following criteria: ‘take out type X’. And she also says take out all the orders that have negative values because those are also junk. So, I add that to my criteria.  And maybe there are a combination of fields that can’t be done with a query, however it can be done with an Informer Data Flow step.
  • Data Flow. A Data Flow is simply the stream of data from an initial source, often a database query, into the final Informer Dataset. A flow step is an operation that a designer can add to the Data Flow. A flow step can restructure data, add to the data, calculate on the data, and it can also clean the data. Using flow steps, you can go through each row and say, ‘if this is true for this row, toss it out of the Dataset’. So, you can toss out certain records that just don’t belong. You can run calculations across columns that may span across multiple data sources. You can run multiple pass calculations. You can create scores based on multiple factors in a row. The possibilities are limitless.

 
In conclusion, with Informer, you have oversight and governance of the data, because the curator constructed the Dataset and the rules (Filtering and Data Flows) which were agreed upon. The result is, you have the “golden standard” of that data for everyone conducting self-service reporting. This maybe an ongoing process but people don’t have to worry about what’s good and what’s bad because the curation process takes care of that.

For more please contact sales@entrinsik.com or call 888-703-0016.

This article was written by Andrew Morovati
Informer Chief Solutions Architect

Read More
← Back to Blog

True Self-Service Reporting

Self-service reporting has long been a goal of the IT departments I consult with. They long for the ability to pass along tasks involving query writing and data retrieval to the end-user. Basically, they want their user base to “write their own reports.” There are several solutions on the market today that attempt to provide such relief.

Most BI solutions or reporting tools within enterprises are connected to a database which users can query. To make this work for the user base, the IT department creates a modeling layer which represents certain subsets of tables and subsets of columns within those tables. In the most basic use cases, IT sets up security credentials for users and then the end-user logs in, runs an SQL query, and gets back a grid of data to work on.

This process seems straightforward: ask the tool for the data you want, and you get it back. However, the data modeling has to be clear enough to the users so they know which columns in the model represent which data. The modeling scheme can be simplified by making field names more understandable. That’s fine for a simple “Person” table containing trivial columns (first name, last name, city, state) and the query is “People that live in PA.” The user just clicks on “state”, selects equals, types in “PA”, selects the output columns, and runs the report. Some tools handle self-service reporting in this simplistic way. Even more sophisticated tools that offer visualization and other reporting options still rely on the end-user for knowing where the data is and how it’s structured.

Running More Complex Queries

Once the query gets more complex, the user is going to require more knowledge, or “domain knowledge”. Suppose the query is for specific types of transactions run by specific types of persons – let’s say sales transactions run by sales reps who also have person records. This query hops among three tables, and the hops themselves may be indirect. We may not be hopping from “Sales Transactions” to “Sales Reps” to “Person.” The query may go from “Transactions” to “Sales Rep Transaction Middle Table” to “Sales Rep” to “Sales Rep Middle Table” to “Person.” For their report to be correct, users will need to know that they’ve got to make these hops. Although this is a rather trivial situation for a database query, the user will need a certain amount of domain knowledge to create the report.

Accessing a broader set of data is going to be intimidating for a lot of users. Fortunately, the ideal self-service reporting solutions remove the requirement of knowing how tables and models interact. These solutions simply enable you to say, ‘I want to find out everyone from PA’. So, the question is, how can you empower your users with a truly self-service solution that makes it easy for them to do all sorts of analysis and reporting?

Entrinsik offers an answer. With Entrinsik’s Informer you can create Informer Datasets — a curated, single-source of truth that enables you to bring in data from each of your different Data Sources. A data domain expert would have authorization to create a very large Dataset that might include transactions, sales rep names, sales rep addresses, last year’s sales, person information, etc. The Dataset could easily have fifty columns and millions of rows.

Informer provides for very easy filtering on this data by showing you what columns are filtered to support your reporting and analytics need. For example, you can easily look at a column heading and say, ‘this state column is what I want to filter on to create a report’. The Dataset will quickly provide a list of states and you simply select the states you want. You don’t even know that the process involved hopping over several tables, or perhaps gathering data from an entirely different database, or even a calculation derived from other columns. All you know is the data is there.

The Open Query Challenge

Among the challenges associated with traditional self-service reporting is the open query problem. When people are permitted to run self-service database queries, they are often inclined to choose the columns they want first. Then they run a wide open, unfiltered database query – maybe during the day when production and sales transactions are going on – and the system pulls out an enormous amount of data. They scan the first page and then iterate over this until they figure out how to get the query to shrink the set of data to the scope they want. Often these queries are huge, and your database gets hit repeatedly, which results in the system slowing down due to a lot of unnecessary CPU activity on the database side. That doesn’t need to happen if the data can be offloaded into a staging solution.

As an IT strategy, data warehousing solved this issue by redirecting people away from hitting the production database. With this strategy, either a manual or automatic infrastructure extracts data from the production database and copies it onto another database which individuals can access for reporting purposes. However, the skillsets necessary to design the ETL processes to populate the data warehouse are typically rare. In many cases the users with the domain knowledge of the production database need to collaborate with IT staff in order to create these warehouses. In addition, giving individual network users access to several data marts or warehouses creates a distributed security problem. For example, controlling “who can access what” that’s stored in a distributed way across the data landscape.

Informer’s approach uses an Informer Dataset as a barrier between the user and the production database. With Datasets, designers don’t need to know sophisticated ETL techniques, and users don’t need to know a lot to produce sophisticated reporting. You won’t run the risk of users interfering with the production database because Informer handles that load for them. Since these users are acting on the Datasets only, Informer acts as a buffer between the user and data source. And Informer’s security model ensures “who can access what” is located centrally.

Reporting Consistency

Another issue with self-service reporting is consistency among analytics. Suppose Leif runs his report for “Sales figures for PA sales reps,” and Isabel runs her “PA sales by salesperson” report. Often, their row counts, and subsequent aggregations (like “total sales”), will differ. This can be a problem, no matter whether they are given access to report directly against the database or against the data warehouse. Informer solves this with its Dataset. Once the concept of “sales transactions” can be defined, and potentially complicated filtering is agreed upon (e.g. “Exclude intra-division transactions”), Leif and Isabel can use this Dataset to compile their reports.

We have said that a data domain expert can author an Informer Dataset. Where does the data domain expert come from? Typically, larger organizations employ data domain experts — people who understand the data source schema inside and out. These individuals can construct broad Datasets (for example “Sales Transactions”) that maybe used for self-service reporting. However, even at smaller organizations, lots of people utilize database queries or Excel spreadsheets in their work. And typically, there are users who compile the data for others to use. In a scenario like “Bob always uses this spreadsheet that he reviews and has final say on, and then he distributes it for everyone to use” — clearly the person to be creating the Informer Dataset is Bob. Instead of Bob cutting, pasting, positioning and duplicating values from one data source into the spreadsheets, he will have a much easier time designing an Informer Dataset. Moreover, that Dataset can be secured in Informer so only the team of users that need it can see it. This avoids security concerns of passing around unsecured spreadsheets.

One interesting Entrinsik Informer use case comes from a higher education customer. Management wanted to compare, per cohort, the number of students that were present for their second fall semester, second spring semester, their third, etc. This was handled by a manager working with multiple Excel spreadsheets. At a certain time during each semester she saved these spreadsheets as a snapshot in time for reporting purposes. Then, she looked at each of these spreadsheets for each student to find their status at that time and added the information into certain cells in a different spreadsheet. It took her weeks to do this, but she needed to accomplish this because school management wanted to analyze the information.

Informer was able to help her by uploading the spreadsheets into separate Informer Datasets. Then, a summary Dataset was created that linked to each one of the individual Datasets. All she needs to do is simply upload the new census data and other data she gathers into the separate Datasets and then Informer creates a summary Dataset for her. She doesn’t have to do any manual work. She just runs the Dataset and it all shows up for her. Because she had a lot of experience working with this information, she knew what data she needed, and she knew how to get to the data. However, she didn’t know any way to automate it. Now she owns that Dataset for everyone to use, and they do a lot more analysis now than just the one thing they were doing before.

Data Latency Concerns

What about data latency? What’s an acceptable amount of data latency between the production database and the reporting database for your users? How do you solve for when the sales folks say, for example, they need up-to-date information now and they can’t wait for the reporting database to be updated overnight? How much transpired time is acceptable for them to pose questions on the current data and still have valid answers based on when the query ran — one hour, six hours, one time a day? The answer depends on a case-by-case basis but fortunately Informer’s Dataset can refresh on a schedule that you establish based on what is convenient for your operations. So, if users need to have the data refreshed each hour, Informer can do that.

Keep in mind that when you’re working with trends, you don’t need 100% up-to-the-minute transactions. That’s not what trends are about. They are about broad strokes. However, in the case where someone does need to know the latest information “right now.” Informer has an ad hoc query facility. That doesn’t solve the problem regarding whether the user knows how to run a query, so only certain security roles can run these.

A Solution That Works for Everyone

An ideal solution provides for two types of updates: refresh only the information that has changed since the last scheduled update and update all records. Informer supports both. Datasets can also be maintained via Informer’s REST API. Rows can be pushed into the Dataset, or the Dataset may be refreshed upon a certain event occurring.

With Entrinsik Informer Datasets, you can provide true self-service reporting to all of your employees and enable them to find new and more productive ways to improve operations and achieve your goals.

For more please contact sales@entrinsik.com or call 888-703-0016.

This article was written by Andrew Morovati
Informer Chief Solutions Architect

Read More
← Back to Blog

How to Best Design Datasets

I had a post recently on the Informer Community web page asking us how to best design Datasets. There are a couple of alternatives worth discussing.

Datasets bring the concept of data modeling to the forefront. Designing your Dataset requires much broader thinking than designing a single report. When designing Datasets, think of data in categorical terms. For example, if I am a manufacturer, I might have an Inventory Dataset and a Sales Dataset. Each of those Datasets would include all the data points I might need when reporting on or analyzing the data associated with that specific part of the business.

You also want to think about how fields that are derived from Data Flows fit into your design. With Data Flows, data doesn’t have to only come from your source database. You can also create new data with values derived from existing data residing in your Dataset.

There are a few questions to consider when designing Datasets:

  • Should I incorporate everything into one big Dataset or should I break down the Datasets into smaller ones
  • Which approach would be the best for this particular situation? There are pros and cons to each.

The question raised on the Informer Community site was from a college that keeps a record of their alumni and all the alumni gifts/donations the institution receives. They wanted to know if they should:

  1. build just an alumni Dataset that combines alumni, alumni gifts/donations, and demographic information pulled from their database
  2. or, take a two-step approach whereby they first build a broader Dataset that includes alumni, alumni gifts/donations, demographic information pulled from their database, students, staff, faculty, etc., and then build a separate alumni Dataset which would pull the necessary data from the broader Dataset.

There are times when it might be useful to create separate Datasets in a two-step approach. Having a separate Dataset containing broader information creates a single source of truth. It also provides for consistent reporting and analysis since any data manipulation or scrubbing would need to be done only once. And, having a broader Dataset in which to pull data from enables analysis and reporting to be conducted on just the demographic data contained within. There would be no need for Dataset designers to remember to add the logic to cleanse the data in each Dataset if you design your Dataset structure to pull data from this broader Datasource.

The drawback to employing a separate Dataset approach is you have multiple Datasets to manage. How do you schedule those to be refreshed? How often should they be refreshed? If the Dataset is rather large, it can be somewhat challenging if you’re trying to refresh the Dataset frequently. Also, the size of the Dataset could impede how often you want to refresh it. For example, if it’s taking 30 – 45 minutes to refresh, you wouldn’t want to do a refresh every 30- 45 minutes. You’d need to give it some time to finish before doing it again. And then, if you have one Dataset refreshing at a different rate than the Datasets on which it pulls data from, it’s possible they could get out of sync. For example, if the alumni Dataset gets refreshed more frequently than the broader Dataset, you could potentially have an instance where alumni data exists but their corresponding record in the broader data does not exist (or vice versa).

By using Informer Jobs, you can schedule all 3 Datasets to refresh in a single Job, so you don’t have to worry about the timing for when each Dataset is refreshed.

A good situation for using separate Datasets would be where you have multiple one-to-many relationships in your data. For example, let’s say you have a CRM (Customer Relationship Management) database that includes all of your accounts and each account has more than one contact person. That creates a one-to-many relationship between the account and the contacts. Accounts can also be tied to multiple sales opportunities. If we combine data from accounts, contacts, and opportunities into a single Dataset, we create a Cartesian effect. For example, if we have an account with 3 contacts and 2 opportunities, the resulting Dataset would have 6 records (3 contacts x 2 opportunities) listed for the one account. This can make manipulating and displaying the data rather tricky because you end up with duplicates.

To prevent the Cartesian effect, you could create separate Datasets for accounts, contacts, and opportunities. You would then use a Flow Step to pull the records from the contacts and opportunities Datasets into the accounts Dataset. This makes for a much cleaner Dataset with less to manipulate.

When it comes to the frequency of Dataset updates (or refreshes), there are a couple of things to consider:

  • How often is the data in the database actually changing? Frequently changing data may need to be refreshed more often than data that isn’t changing very often, if at all (historical data for example).
  • How big is the resulting Dataset? This will impact the time it takes to run the underlying query. If it takes 30 minutes for the query to run, the Dataset should not be refreshed every 20 minutes.
  • Does the Dataset query require a lot of resources from the database server? A query that is taxing on resources (such as processor or memory) might need to be run after business hours to prevent performance impacts on other applications that use the database.

Ultimately the goal of a Dataset is to:

  • provide the required freshness of data to users who need to do comprehensive reporting and analysis without the requirement to constantly query the database for the latest data
  • provide a single source of truth for all reporting and analysis to users.

For more please contact sales@entrinsik.com or call 888-703-0016.

Read More