Datasets

 

Datasets: the Foundation for Successful Data Analytics

Create One Version of the Truth

Data exists everywhere throughout your organization: in databases, on shared network servers, in the cloud, on countless desktops. Data also exists in various formats: as records in multiple types of databases, as structured text files, as proprietary spreadsheet documents, as streams from REST calls to a 3rd party API. Informer enables you to discover business intelligence from all these disparate silos but first you’ll want to get your data into the form of an Informer Dataset.

A Dataset is the elementary building block for Informer content to perform data analytics. A Dataset is a set of indexed Records and a Record is a collection of Fields. A Field can be a simple value such as a number or text, an object, or an array of objects.

Creating Datasets From
Structured Files

You can create Datasets from existing documents by simply dragging a document with a supported file format from your desktop onto any location in Informer. See Figure 1. Informer will create a new Dataset from the file content. It’s that easy! Currently supported formats include Comma Separated Values (.csv), JSON (.json), and Excel documents (.xls, .xlsx). As with supported Datasource types, Informer will support more file format types over time.

Figure 1: Drag a file onto Informer

Informer Workspace

While dragging in an individual static document file to create a Dataset is ideal for doing quick one-off analytics, creating an Informer Workspace provides more flexible and governed access to the set of data. A Workspace is a collection of associated source documents which Informer treats as a virtual database. It enables you to create mapping associations and other standard Datasource functions just like a traditional database. Workspaces are ideal for importing multiple files for business intelligence.

Dataset Designer

Figure 2: Add Fields

Dataset Designer

The Dataset Designer makes it easy for you to create a Dataset visually by using Mappings. You do not need to be an expert! It is especially useful for those who don’t know the Datasource’s query language and/or are not intimately familiar with the schema and structure of the underlying Datasource.

After choosing a Datasource and Mapping for your query within Dataset Designer, just select Fields and Criteria — the two primary components of a Query:

  • Fields define what details you want to see about the records you’ll be retrieving from a Datasource. Datasets can be customized to show only certain Fields by adding the desired Fields within your Query. Add as many Fields as you want. You can also add Fields from other Mappings. See Figure 2.
  • Criteria enables you to pare down which records are retrieved.

For example, let’s say you have an Orders table with historical details of orders including product details, shipping details, billing details, etc. The Fields you choose for the Dataset determine what you ultimately view from the selected records. If you want a Dataset to include only those orders containing a specific product, you would add that restriction within your Criteria.

Add more Mappings if would like to. You can add a link or an SQL link:

  • Add Link – Pair up Fields between Mappings. Add as many new Fields as you’d like.
  • Add SQL Link – Join tables to create a joint Mapping.

Add Criteria to the Query

Criteria defines which records are retrieved from your Dataset for your data analytics. You can include Criteria within a Dataset or Ad-hoc Query Report. You can also choose whether you want a Criteria to be based off a Field or Value. See Figure 3.

  • You can select the condition for the Criteria, for example “exactly matches”. Select either a Value, Field, or Input to compare the first part of the condition.

Add Criteria to the Query

Figure 3: Add Criteria

Flows

For more in-depth business intelligence, Flows enables you to augment your Dataset records with additional data, metadata, or logic as your Dataset records are indexed. You can easily derive new values from your existing data as well as cleanse bad data from your Datasource. Within the Dataset, you have the options to:

  • Create new Fields that perform certain functions against one or more selected Fields from the Dataset or Fields from a different Dataset. See Figure 4.
  • Configure the Fields that you’ve added by adjusting data types, replacing Field values, removing duplicates, and splitting Fields with multiple values. See Figure 5.
  • Include advanced power scripting which adds more advanced programming capabilities to the Flow step process to further derive new values or cleanse existing data.
  • Remove Fields that you’ve added

Figure 4

Figure 4: Flow Step – Add Field

Figure 5: Flow Step – Transform

Figure 5: Flow Step – Transform

Refreshing Your Dataset

There may be occasions where you want to periodically refresh one of your Datasets. Refreshing enables you to update the data in your Dataset with current data from your Datasource, manually or automatically via an Informer Job. See Figure 6. An Informer Job is a task or collection of tasks to be executed using data in your system. These tasks can be automated using a recurrence schedule, or simply saved in the system to be manually triggered when needed.

Alternatively, you can easily append data to a Dataset without replacing all your current data. See Figure 7.

Figure 6

Figure 6 : Enabling ‘Refresh on Job run‘

Figure 7

Figure 7: Appending Data

Native SQL

If you have an existing query statement, or if you are comfortable with authoring one and simply don’t want or need to use the Dataset Designer… Native SQL is for you! Just choose a Dataset Name and a Datasource against which to execute your query. Click Query and begin typing. You’ll notice the rich editor offers autocomplete and color coding for both SQL syntax and schema as shown in Figure 8.

Figure 8

Figure 8: Native SQL Editor Autocomplete

Upon execution of your query, Informer provides progress updates and forwards you to the Dataset Explorer page containing your query result, once the results are available. You now have a Dataset that’s ready for data analytics!

Want to learn more about Informer?