Future: Brenda Mutai

Bayesian vs Frequentist

Brenda Mutai — Fri, 17 Oct 2025 08:29:47 +0000

In statistics, there are two main schools of thought for making inferences from data. Frequentist and the Bayesian approaches aim to answer the same questions, such as estimating parameters, testing hypotheses or predicting future outcomes, but they differ in how they interpret probability and uncertainty.
Flip a coin. Before you look at the result, pause and ask: What’s the probability the coin landed on heads?
Depending on your answer, you’re either a Frequentist or a Bayesian statistically speaking, at least.
According to a Frequentist statistical approach, there’s a single correct answer. If the coin is heads, the probability that the coin landed on heads is 100%. If it’s tails, the probability is 0%.
In Bayesian statistics, probabilities are interpreted subjectively. Using the coin toss as an example, a Bayesian would say that the probability of getting heads or tails reflects your personal belief. You might start by assuming there’s an equal 50% chance for each side, but your confidence in the coin’s fairness could shape that belief. After observing the outcome, you would then revise or update your belief in light of the new evidence.
The main difference between the two methodologies is how they handle uncertainty. Frequentists rely on long-term frequencies and assume that probabilities are objective and fixed. Bayesians embrace subjectivity and the idea that probabilities change based on new information.

Bayesian vs Frequentist

Brenda Mutai — Fri, 17 Oct 2025 08:07:45 +0000

The Importance of Skewness and Kurtosis in EDA

Brenda Mutai — Tue, 30 Sep 2025 10:29:27 +0000

Once the data has been collected and carefully cleaned, the next step is to dive into exploring it. This process, called Exploratory Data Analysis (EDA), plays a vital role in any data project. The insights uncovered during EDA guide and influence the decisions made throughout the entire workflow.
A key activity in EDA is examining the distribution shapes of your variables. Understanding these shapes directly impacts later decisions, including:

Preprocessing steps
Feature selection strategies
Algorithm selection
Detecting outliers and deciding if to remove them

While visualization is useful, it’s often necessary to have numerical measures for greater reliability. Two important metrics for this are skewness and kurtosis, which help evaluate how closely your data’s distribution aligns with the ideal normal distribution.

SKEWNESS

Skewness is a statistical measure that captures the asymmetry of a distribution around its mean. In a perfectly normal distribution, both tails are balanced, but if one side extends farther than the other, the data becomes skewed. Skewness quantifies the extent of this imbalance.
Accurately identifying and measuring skewness helps reveal how data values are distributed around the mean and guides the selection of appropriate statistical methods or transformations. For example, when a distribution is highly skewed, applying normalization or scaling can make it closer to a normal distribution, which in turn can improve model performance.

Types of Skewness
There are three types of skewness: positive, negative, and zero skewness.
1.Zero skewness
Zero skewness means the distribution is perfectly symmetrical around its mean. The mean, median, and mode are all at the center point.

2.Positive skewness
A positively skewed (right-skewed) distribution has a longer right tail, with the mean greater than the median and the mode being the smallest. Most values cluster on the left, while a few extreme values stretch the distribution to the right.

3. Negative skewness
A negatively skewed (left-skewed) distribution has a longer left tail, with the mean less than the median and the mode being the largest. Most values cluster on the right, while extreme values pull the distribution to the left.

How to calculate skewness
There are many ways to calculate skewness.
_a. Pearson’s second skewness coefficient _

this is also known as median skewness.

Let’s implement the formula manually in Python:

import numpy as np
import pandas as pd

# health dataset
bmi = pd.Series([22, 24, 27, 30, 35, 40, 18, 25, 29, 32])

mean_bmi = bmi.mean()
median_bmi = bmi.median()
std_bmi = bmi.std()

skewness_bmi = (3 * (mean_bmi - median_bmi)) / std_bmi

print(
    f"The Pearson's second skewness score of BMI distribution is {skewness_bmi:.5f}"
)

b. Moment-Based Formula (used in statistics libraries)
The more general definition of skewness uses the third standardized moment:

Where:

n represents the number of values in a distribution
x_i denotes each data point

import numpy as np
import pandas as pd

def moment_based_skew(distribution):
    n = len(distribution)
    mean = np.mean(distribution)
    std = np.std(distribution)

    # Formula broken into two parts
    first_part = n / ((n - 1) * (n - 2))
    second_part = np.sum(((distribution - mean) / std) ** 3)

    skewness = first_part * second_part
    return skewness

# Example health dataset: BMI values
bmi = pd.Series([18, 21, 23, 25, 27, 30, 34, 38, 42, 45])

print("Moment-based skewness of BMI distribution:", moment_based_skew(bmi))

Built-in methods from pandas or scipy:

import pandas as pd
from scipy.stats import skew

# BMI values
bmi = pd.Series([18, 21, 23, 25, 27, 30, 34, 38, 42, 45])

# Pandas version
print("Pandas skewness:", bmi.skew())

# SciPy version
print("SciPy skewness:", skew(bmi))

KURTOSIS

While skewness describes the asymmetry of a distribution, kurtosis measures its peakedness or flatness. A high kurtosis means a sharp peak, heavy tails, and a greater chance of extreme values.
Low kurtosis, on the other hand, indicates a flatter peak, lighter tails, and fewer extreme events. For reference, a normal distribution has a kurtosis of about 3.
Types of Kurtosis
Based on kurtosis values, distributions are classified into three types:

Mesokurtic (kurtosis = 3, excess = 0): resembles a normal distribution.
Leptokurtic (kurtosis > 3, excess > 0): tall peak with heavy tails.
Platykurtic (kurtosis < 3, excess < 0): flatter peak with lighter tails.

How to calculate kurtosis
If you want a manual calculation of kurtosis, you can use the following formula:

Where:

n = number of observations
ˉx= sample mean
s = sample standard deviation
x_i= each data point

In Python, you can calculate kurtosis the same way as skewness, by using Pandas or SciPy.

import pandas as pd
from scipy.stats import kurtosis

# BMI values
bmi = pd.Series([18, 21, 23, 25, 27, 30, 34, 38, 42, 45])

print("Kurtosis of BMI distribution:", kurtosis(bmi))

In Pandas, kurtosis can be calculated using either kurtor kurtosis. The kurt method works only with Series objects, while kurtosis can be applied to entire DataFrames.

import pandas as pd

#health dataset
health = pd.DataFrame({
    "BMI": [18, 21, 23, 25, 27, 30, 34, 38, 42, 45],
    "BloodPressure": [110, 115, 120, 118, 125, 130, 135, 140, 145, 150],
    "Cholesterol": [160, 170, 175, 180, 185, 190, 200, 210, 220, 230]
})

# Kurtosis of a single column (Series)
print("BMI kurtosis:", health["BMI"].kurt())

# Kurtosis of all numeric columns (DataFrame)
print("\nKurtosis of all health metrics:\n", health.kurtosis())

Skewness and kurtosis are powerful metrics in exploratory data analysis. Skewness helps us understand the asymmetry of a distribution, while kurtosis highlights its peakedness and tail behavior. Together, they provide deeper insights beyond simple measures like mean and variance, guiding decisions on preprocessing, transformations, and model selection. By combining visual inspection with these statistical measures, analysts can better assess data quality and prepare it for reliable modeling.

Similarities Between a Stored Procedure (SQL) and a Python Function

Brenda Mutai — Mon, 08 Sep 2025 12:56:25 +0000

SQL stored procedures and python functions seem to live completely in different worlds databases versus programming.
They share various similarities.

Encapsulation of logic -A stored procedure encapsulates a set of SQL operations. -A Python function encapsulates a block of Python code.
Reusability -A stored procedure multiple times with different parameters. -A Python function can be invoked whenever needed.
Use of Parameters
Stored procedures accept input parameters.
Python functions accept arguments.

EXAMPLE
CREATE PROCEDURE GetOrdersByCustomer @CustomerID INT AS BEGIN SELECT * FROM orders WHERE customer_id = @CustomerID; END;

4.Return Values

A stored procedures can return datasets or output values.
Python functions return values explicitly using return.

5.Error Handling

They both can include error handling, stored procedures with TRY...CATCH, Python functions with try...except.

Stored procedures in SQL are like functions in Python both encapsulate reusable logic, accept parameters, and produce results, but they operate in different environments, databases vs. applications.

The Difference Between Subqueries, CTEs, and Stored Procedures

Brenda Mutai — Mon, 08 Sep 2025 11:31:25 +0000

When working with SQL, often at times we will encounter subquery, CTE(Common Table Expression) and stored procedure. All these may look similar since they deal with querying and organizing data, but they each serve a different purpose.

1. A subquery

A subquery is a query inside another query. Its often used to fetch intermediate results in the main query.(Think of it as asking a question inside another question)
The main role is to return a set result that the outer query uses to execute.

EXAMPLE
SELECT first_name, last_name FROM customers WHERE customer_id IN ( SELECT customer_id FROM orders WHERE total_amount > 500 );

When to use the subqueries

Filtering results (WHERE,EXISTS,IN)
Calculating aggregated values

The pros of using the subquery is because they are simple and quick to write while the downside is they are harder to read if the queries are complex.

2. Common Table Expression(CTE)
A CTE is defined the WITH clause. It is a temporary result set that exists within the scope of SELECT, INSERT, DELETE or UPDATE. CTEs main role in SQL is to improve readability, maintainability and are great for breaking complex queries into small pieces.

EXAMPLE
WITH customer_totals AS ( SELECT customer_id, SUM(total_amount) AS total_spent FROM orders GROUP BY customer_id ) SELECT c.first_name, c.last_name, t.total_spent FROM customers c JOIN customer_totals t ON c.customer_id = t.customer_id WHERE t.total_spent > 500;

When to use

There is no need to create a separate summary table.
Always uses up-to-date values.
Keeps queries dynamic and flexible.

The pros of using improves readability and supports repercussions and the cons not reusable across sessions.

3. Stored Procedure
A stored procedure is SQL code saved in the database, which you can call whenever needed. It can include multiple queries, loops, and conditional logic.
The main role of a stored procedure in SQL is to encapsulate and execute a set of SQL statements and procedural logic as a single, pre-compiled unit within the database.

EXAMPLE
CREATE PROCEDURE GetVIPCustomers() BEGIN SELECT c.first_name, c.last_name, SUM(o.total_amount) AS total_spent FROM customers c JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id HAVING total_spent > 500; END;

The pros of reusable, encapsulates complex operations and the cons are harder to debug.

PostgreSQL Installation Guide for Linux Servers

Brenda Mutai — Sat, 02 Aug 2025 14:31:12 +0000

Introduction
PostgreSQL is an advanced, opensource relational database system known for its performance, extensibility and standards compliance.
Its a go-to choice for developers and database administrators across the globe.

Prerequisites

Operating system (Ubuntu, CentOS, Debian)
Stable internet connection
Terminal
User access with sudo privileges

Installation guide on ubuntu

Update your system

Ensuring system package list is up to date before installing new software.

For Ubuntu

sudo apt update && sudo apt upgrade

2.Install PostgreSQL

PostgreSQL :Main database
PostgreSQL-contrib: Add useful extensions

3.Verify Installation

psql --version sudo systemctl status postgresql

PostgreSQL User Setup
Switch to the postgress user

sudo su - postgres

Access the PostgreSQL prompt.
psql

Set a password for the postgres user
ALTER USER postgres WITH PASSWORD 'your_secure_password';

Exit the postgres user.
exit

Post-Installation Tips

Database users should use strong passwords.
Restrict remote access using a firewall.
Back up your database regularly.