## Blog

## World Bank data on maternal mortality using R

The World Bank provides open data for many indicators across most countries, spanning the last few decades. This data is available online with searches available by country codes (iso2c and iso3c), indicator names, and by dates. The indicators can be...

## R tutorial: Testing assumptions for parametric tests

In this post, written as an R-markdown file and posted on RPubs, I discuss the assumptions for the use of parametric tests in R. Parametric tests such as the various t tests, analysis of variance (ANOVA), and correlations are only valid if certain...

## Rpubs markdown files and YouTube videos on R

R is a programming language designed by statisticians for statistical analysis. It is a free programming language and is available for download (Windows, Mac, and Linux). Bar a few eccentricities, it is quite easy to learn R. We make extensive use of it in the Klopper...

## Understanding binomial logistic regression using R

Logistic regression is a statistical test that uses independent variables (categorical or numerical) to predict a categorical dependent variable. It is based on the principles of linear regression. As the outcome (dependent) variable is categorical, though, logistic...

## Assumptions for the use of parametric tests in R

In this post I discuss some of the assumptions that must be met for the use of parametric statistical tests. The post contain snippets in the R statistical programming language to help visualize the concepts and to show how these assumptions are tested. Click on the...

## Course on SPSS for medical statistics

At a recent meeting of fellow surgeons in my department, an interesting difference of opinion arose. It relates to our trainees’ knowledge of statistics. Unfortunately, the meeting did not allow any time to properly discuss the topic. Some background to illuminate...

## Sharing your machine learning models with others

So, you've spent a lot of time and effort in creating your python machine learning model. The parameters have been tweaked and the metrics look great. Now what? How do you share it with others to use? Well, one easy way it to pickle it. The pickle...

## K means clustering using python

The scikit learn library for python is a powerful machine learning tool. K means clustering, which is easily implemented in python, uses geometric distance to create centroids around which our data can fit as clusters. In the example attached to this...

## Predicting appendicitis using machine learning in Mathematica

I note more and more published papers on machine learning. As a clinician, I find it a fascinating way of looking at patient data. In case you are not familiar with machine learning, the definition given over at Wikipedia is: Machine learning is the...

## Teaching statistics and data science in medical school

Understanding statistical analysis and interpreting the results of research papers are just as important as the ability to correctly diagnose the cause of acute abdominal pain. Medical knowledge is expanding at a rapid pace. This is evident by the number...