Published on
March 28, 2018

Things You Should Know About Data Preparation for Analytics and Visualization

Data has been portrayed as the raw material for business and the revolution of the 21st century. The volume of information utilized as a part of the company, research, and development are huge, and it keeps on developing. It winds up increasingly hard for a client to get the key message from this data lake.

Usually, the answer to the majority of the above problems listed is a report or perhaps some fancy graphs. However, this is not sufficient. Individuals need to see information that they can further drill down - data that can enable them to make better choices and enhance implementation.

Einstein Analytics - a Salesforce integrated, robust BI platform can visualize your data to provide a better understanding of your data. To do that AppShark - Salesforce Silver Partner can understand and translate your data into actions to raise more funds, offering better programs to your organization. There are various steps involved to convert data into action.

Is Your Data Ready for Analytics

The first things to consider are data quality, framework, and detailed structures. Recently, I was involved in a big data reporting project where the data was reaching the upper limits of the environment. Because of time and cost constraints, we needed to work inside the current environment, with no choice to scale out. To complete the task on time and under budget, we needed to take care of the issue by working with the data. At the fruitful finish of the project, I chose to archive the means we made to transform the data.

The list is not intended to be comprehensive, yet should be viewed as a beginning stage to enable you to get the most out of your data and environment while being great stewards of those valuable IT assets.

Data Transformation Steps to Consider

Remove unused segments

Sometimes you acquire data since it is a piece of the first data source. Data representation is an excellent method to pick up bits of knowledge into your data and comprehend what might be the value of your investigation.

Visualizations, for example, Descriptive Analysis, Box Plot and, Scatter Plot shows the dispersion of data (mean, median and mode) that can give you a better understanding of your data components.

Decrease the length of fields

Sometimes a field length depends on the length of the most extended data component. Evaluate how you will utilize this information. You may have the capacity to decrease the length of the field.

In my project, there were multiple fields with lengthy characters. By shortening those data points, I could reduce the length. Given that my dataset contained more than 50 million records, this activity alone can decrease the dataset size significantly.

Decrease the cardinality of the dataset

High cardinality may corrupt performance, so it is a smart move to assess your information and remove it where it is possible. For instance, some data will normally have high cardinality given their remarkable nature, such as Client ID or Employee ID.

Few data will have high cardinality, therefore, focus on these data points. In my project, there was a measure that was the consequence of a Salesforce Analytics Query Language (SAQL) calculation. The original dataset had this reaching out to ten decimal spots. However, we didn't require this accuracy and detail in our report which leads to very high cardinality. Adjusting this field down to two decimal spots remove 90% of the high cardinality for this calculation.

Prepare multiple datasets to eliminate multi-value limitation

In my project, there is a challenge where stage analysis needs different objects from Salesforce which brings augmentation of objects into the picture. However, it contains multiple picklist values in the dataset which drops the records. To overcome this problem, we have used different datasets instead of augmenting it.

Extracting an object once to build an optimized dataset

When working with big data, each object should be extracted only once with an incremental update to eliminate the duplicity of the dataset.

Date timestamp handling in analytics

Our client is using Pacific time zone in Salesforce. It appears that the dates are stored in Salesforce as UTC. However, Einstein Analytics UI does not consider the user time zone. In this case, we have to create a new DateTime field using the todate() function in SAQL.

Figure 1. Schematic Diagram of Data Preprocessing

The robustness, performance, and agility of Einstein Analytics enables customers to envision and analyze data, providing reports on vast amounts of data. Data transformation may not always be necessary, however above listed guidelines may help you to improve your data quality and decrease the dataset size. Utilize this not as an entire agenda, but rather as a beginning stage to enable you to visualize and interpret your data.

Related Posts

o Highlights of Significant Changes in Einstein Analytics Winter ’18 Release

o Data Transformation Using Dataflow Builder and Importing It into Einstein Analytics

o What You Need to Know About Salesforce Einstein