Blog
JHU coronavirus analysis end 2020
There are numerous analyses on the internet and in research papers regarding COVID-19. Data from the pandemic is very useful for creating educational material. The Johns-Hopkins University (JHU) data repository contains large open data sets on the pandemic.
In this notebook, I showcase the use of this data resource. The aims are as follows:
Use the JHU data as teaching material for the R language
Use the JHU data as teaching material for data analysis
Compare data between countries (South Africa, Germany, United Kingdom)
Look ahead at what may happen in South Africa in early 2021
R tutorial: Just getting started with R? Here is a post on inspecting univariate data
If you are new to R, then perhaps a look at simple univariate data is a good place to start. In this RPubs post, I take a look at both categorical and numerical data. It is quite easy to calculate descriptive statistics of univariate data and to...
World Bank data on maternal mortality using R
The World Bank provides open data for many indicators across most countries, spanning the last few decades. This data is available online with searches available by country codes (iso2c and iso3c), indicator names, and by dates. The indicators can be...
R tutorial: Testing assumptions for parametric tests
In this post, written as an R-markdown file and posted on RPubs, I discuss the assumptions for the use of parametric tests in R. Parametric tests such as the various t tests, analysis of variance (ANOVA), and correlations are only valid if certain...
Rpubs markdown files and YouTube videos on R
R is a programming language designed by statisticians for statistical analysis. It is a free programming language and is available for download (Windows, Mac, and Linux). Bar a few eccentricities, it is quite easy to learn R. We make extensive use of it in the Klopper...
Understanding binomial logistic regression using R
Logistic regression is a statistical test that uses independent variables (categorical or numerical) to predict a categorical dependent variable. It is based on the principles of linear regression. As the outcome (dependent) variable is categorical, though, logistic...
Assumptions for the use of parametric tests in R
In this post I discuss some of the assumptions that must be met for the use of parametric statistical tests. The post contain snippets in the R statistical programming language to help visualize the concepts and to show how these assumptions are tested. Click on the...
Course on SPSS for medical statistics
At a recent meeting of fellow surgeons in my department, an interesting difference of opinion arose. It relates to our trainees’ knowledge of statistics. Unfortunately, the meeting did not allow any time to properly discuss the topic. Some background to illuminate...
Sharing your machine learning models with others
So, you've spent a lot of time and effort in creating your python machine learning model. The parameters have been tweaked and the metrics look great. Now what? How do you share it with others to use? Well, one easy way it to pickle it. The...
K means clustering using python
The scikit learn library for python is a powerful machine learning tool. K means clustering, which is easily implemented in python, uses geometric distance to create centroids around which our data can fit as clusters. In the example attached to this...
Most Popular Posts
JHU coronavirus analysis end 2020
There are numerous analyses on the internet and in research papers regarding COVID-19. Data from the pandemic is very useful for creating educational material. The Johns-Hopkins University (JHU) data repository contains large open data sets on the pandemic.
In this notebook, I showcase the use of this data resource. The aims are as follows:
Use the JHU data as teaching material for the R language
Use the JHU data as teaching material for data analysis
Compare data between countries (South Africa, Germany, United Kingdom)
Look ahead at what may happen in South Africa in early 2021
R tutorial: Just getting started with R? Here is a post on inspecting univariate data
If you are new to R, then perhaps a look at simple univariate data is a good place to start. In this RPubs post, I take a look at both categorical and numerical data. It is quite easy to calculate descriptive statistics of univariate data and to...
World Bank data on maternal mortality using R
The World Bank provides open data for many indicators across most countries, spanning the last few decades. This data is available online with searches available by country codes (iso2c and iso3c), indicator names, and by dates. The indicators can be...