Top Key Statistical Concepts for Data Science

Learn Key Statistics in Data Science with Practical Python Examples

3 min readNov 2, 2023

Key Statistical Concepts for Data Science | By Adegboyega Aare

Statistics is like the starting point for data science. It’s the tool that helps us understand data better. Whether you’re trying to figure out what’s happening in the market, make predictions about the future, or make sense of a small chunk of data, knowing about statistics is super important. In this article, we will delve into key statistical concepts and applications in data science. We’ll also use Python to show practical examples to make it easier to understand.

But first, make sure to subscribe to my newsletter!

Click on the link below & I’ll send you simplified complex Data Science/ML, Analyst, and statistics topics through articles and tips.

Stay informed | Click Here to Subscribe — Newsletter

Here you will get well-curated data science, analysis, and machine learning content geared toward boosting your skills…

aare.substack.com

Ready? Let’s get straight to it. You’ll thank me later!

Descriptive Statistics

Descriptive statistics are tools that help us understand data better. They include things like the average, middle value, most common value, and how spread out the data is. With Python libraries like NumPy and Pandas, you can find and show these statistics.

Python Code Syntax:

import numpy as np
data = np.array([12, 15, 18, 21, 24, 27, 30])
mean = np.mean(data)
median = np.median(data)
mode = np.mode(data)

Use Case: Calculate and display the central tendencies of a dataset to understand the distribution of student ages in a class.

Inferential Statistics

Inferential statistics involves drawing conclusions about a population from a sample. Methods like hypothesis testing and confidence intervals are used for Inferential statistics. Python’s SciPy library has tools that help us do inferential statistics.

Python Code Syntax:

from scipy import stats
sample_data = [5, 6, 7, 8, 9]
population_mean = 7.5
t_statistic, p_value = stats.ttest_1samp(sample_data, population_mean)

Use Case: Determine if a sample of product reviews is representative of the entire customer population’s satisfaction level.

Probability

Probability theory is like the building block of statistics. It’s all about dealing with things we’re not completely sure about and making guesses. In Python, the NumPy library has handy tools for working with probabilities.

Python Code Syntax:

from random import randint
probability_of_heads = sum(randint(0, 1) for _ in range(1000)) / 1000

Use Case: Calculate the probability of getting heads in a series of coin flips.

Sampling Techniques

Sampling is when we pick out a smaller group from a big bunch of data. We use different methods like random, stratified, or systematic sampling to make sure our small group is a good representation of the big group.

Python Code Syntax:

Continue with the Article here!

Click on the link below to read the full article on Top Key Statistical Concepts for Data Science