Ways of Transforming and Integrating Data Into Einstein Analytics

After my first blog, which related to my transformational journey to Salesforce Einstein, I got the opportunity to get training on various Salesforce Einstein products. Today's blog is about how we commence working with Einstein Analytics. If we are to use any business intelligence (BI) product, we need data.

The first step is to prepare and integrate the data that can be used for data visualization using the BI tool for a better understanding of a customer's business.

There are different stages of data transformation and integration into Einstein Analytics.

Data preparation using Dataset Recipes.
Data Integration with Dataflow.
Data Integration with Dataset Builder
Data Integration with Wave Connector.
Creating the dataset with External Data API.
Upload User Interface.

Each step of data integration into Einstein Analytics will be covered in this blog series. Transferring datasets into Einstein Analytics is straightforward and easy. The Salesforce data can be transferred into Einstein Analytics by using dataset builder. Whereas external datasets (viz., .csv, .zip, and .gz format files) can be pushed into Einstein Analytics using two methods viz., External Data API and Upload User Interface.

The Microsoft Excel file can be pushed into Einstein Analytics by installing wave connectors from the Microsoft app store. After importing the datasets into Einstein Analytics, connecting those datasets with the existing one to create a new dataset is cumbersome. Einstein Analytics makes it simple by providing two methods for creating a dataset using dataflow JSON and data recipes, which allow you to take data from various sources so that you can create a new dataset (Figure 1).

Figure 1: Figure illustrates connecting data in Einstein Analytics

While creating a new dataset, you have choices to apply filters and transform your data (for example, Augment, Merge, Compute Delta, and Dimension to measure) so as to dig deeper into the data to measure the key performance indicators (KPIs).

In this series, I am going to talk about data recipes for preparing the dataset in Einstein Analytics. The dataset recipe lets you study the information from existing datasets, transform it, and output the results to a new dataset called target dataset (Figure 2).

Figure 2: Prepare Data in Dataset Recipe

The primary feature of dataset recipe is combining data from different datasets, adding filters, buckets, and formula fields. Also, cleanse the data by applying transform suggestion in the recipe. When you build a recipe, you specify the steps, or transformations, that you want to do on a particular source dataset. Once you run the recipe, it applies these transformations and results are created into a new dataset called the target dataset. The steps involved in Dataset Recipe are illustrated in Figure 3.

Figure 3: Steps involved in Dataset Recipe

Create Recipe

You can create, transform, run, and schedule dataset recipes using Data Manager. In Einstein Analytics, click on the gear icon and then click Data Manager (Figure 4). Data Manager will open in the new tab. In the Data Manager, click the Prepare tab, which shows you a list of recipes you have created previously.

Figure 4: Procedure to Create Dataset Recipe

The preview tab tells you the count of fields and rows. Click on the preview indicator (Figure 5). You can see all the fields from your base dataset, selected by default. Click on apply.

Figure 5: Real-time Preview of Dataset

Add Data / Augment

The "data source" is the new feature in Einstein Analytics of Summer '17 release. The "data source" in the recipe is used to add the new dataset as well as the replicated dataset. This is, of course, possible if you have replicated data available. The mapping of two data sources is possible through the use of multiple keys, thereby making it a complex key argument. This is particularly worthwhile if you do not have a unique ID to match the data sources. Alternatively, you can, for instance, map the opportunity ID with the owner ID or match by the names (Figure 6).

Figure 6: Figure illustrating lookup for the base data and the lookup data (Click to view larger)

Filter Data

Filter Data in the dataset recipe is used when we need to check some specific criteria. For example, if you want to see only won opportunities, you can do this with a dataset recipe using the Filter Data option. It is especially possible when you are using the global filters on your dashboard.

Add Bucket

The Bucket field is not a new concept in Salesforce. It already exists in Salesforce reports. It's now slightly improved in the Summer '17 release. With the added Bucket feature, you can create new values based on the current one. It is employed for grouping values based on measures, dimensions, and dates. For instance, the Bucket field is used to show the categories opportunity as being low, medium, or high. Also, it is used to group accounts by the types of products, and the new feature in Bucket field now allows you to group data on a yearly, quarterly, monthly, weekly, and daily basis.

Formula Field

A formula field is used to calculate new values from existing fields' values in your dataset using math functions and operators.

Transform Suggestions in Dataset Recipe

I like the suggestion of transforming the dataset in Recipes. The following are the transformations that can be applied to the fields:

Convert to Uppercase: Convert values to uppercase in selected fields
Convert to Lowercase: Convert values to lowercase in selected fields
Split: Divide values into different parts and create a new field
Substring: Extraction of a month from a date
Find and Replace: Find a particular value and replace it with a new value in the field
Dimension to Measure: Convert dimensions into numerical values
Extract Date Component: Extract a certain component from a date field into a new one (Added feature in Summer 17 release)

The various types of Transformations listed above are illustrated in Figure 7.

Figure 7: Dataset Recipes suggested Transformation on the right-hand side (Click to view larger)

Navigate Field

Once you add data, apply bucket field, formula field, filter field, and transformation, the number of fields in your recipe preview increases. It is harder for you to find the particular field that you want to work with. The navigation field helps you to search for the field that you need quickly. Also, it will hide the unwanted fields.

Save, Run, and Reschedule Recipe

Save option in Dataset Recipe lets you save the recipe without creating a target dataset. Thus, you can edit the recipe later without creating multiple target datasets.

By contrast, the Run option allows you to create a new recipe for creating a target dataset or update an existing recipe for updating a target dataset. By default, the dataset recipe doesn't run automatically. To begin running the recipe on the schedule, you have to run the recipe manually first.

After the dataset recipe runs, it runs on a daily schedule at a particular time so as to detect the recent changes in the Salesforce data and modifications in the formula by using logic instead of being stuck into hourly, monthly, or particular day-wise scheduling. Furthermore, you can stop, reschedule, and monitor a dataset recipe at any time.

Limitation

The deployment of a dataset recipe is not possible. Nor can you access the JSON file. Therefore, transferring a dataset recipe from your Sandbox to production org is hard by manual creation of a recipe again for the production org, which is a tedious process. Therefore, we can go for dataflow JSON to transfer the dataset instead of using recipes. It's clear that recipes still need a few improvements. We hope to see some more advanced features in the future release.

In this blog, we covered the capabilities of dataset recipes to prepare datasets for building dashboards in Einstein Analytics. I am interested in working with different features of Einstein Analytics as well. The next topic in this series will cover the importation of data using various data integration methods offered by Einstein Analytics.