Data cleaning and screening

Quality of your data is based on Data cleaning and Data Screening

The way your collected data must be cleaned from outliers (extreme points or values) / multivariate outliers, time series outliers, distributional outliers, geometric outliers, typos, how to handle missing or ambiguous data, how to impute missing data, how to deal with skewedness (positive or negative), kurtosis (e.g. platykurtosis), how to de-identify your data in order to cover ethical guidelines of your country, how to deal with non-normal data (normality assumption), bimodal distributions, how to employ transformation and recoding of your data such as using Logarithmic (LG10) or square root (sqrt) or inverse (1/x) or Reflect (-x) function in order to normalize your variable. Moreover, scatter plots, bar charts, histograms, box plots, q-q plots, p-p plots, Kolmogorov-Smirnov and Shapiro-Wilks tests, Mahalanobis distance, Mardia’s statistic, or residual plots may help you to identify multicollinearity, homoscedadicity, autocorrelation, outliers and typos.

The next step is analyzing data with the right way. The statistical analysis must also be interpreted in the right context with any limitation in mind and it must provide directions though for future types of research in order to expand its usefulness.