Select Page
JHU coronavirus analysis end 2020

JHU coronavirus analysis end 2020


There are numerous analyses on the internet and in research papers regarding COVID-19. Data from the pandemic is very useful for creating educational material. The Johns-Hopkins University (JHU) data repository contains large open data sets on the pandemic.

In this notebook, I showcase the use of this data resource. The aims are as follows:

Use the JHU data as teaching material for the R language
Use the JHU data as teaching material for data analysis
Compare data between countries (South Africa, Germany, United Kingdom)
Look ahead at what may happen in South Africa in early 2021

View the complete RPub document here 



South Africa lags behind in the time line of COVID-19. Cases in South Africa were much higher after the first wave. It may be that the case load will be very high in the first part of 2021.

While we do consider that a current strain of SARS-CoV-2 is more infective, there might be confounding factors as there is great concern about human activities and interactions, especially since the progressive lifting of restrictions. The festive season may worsen upcoming case numbers.

Seroprevalence studies in South Africa are showing a a much higher level of infection than confirmed cases report. Vaccines will take the better part of 2021 to reach large parts of South Africa.

Sharing your machine learning models with others

Sharing your machine learning models with others

Jupyter notebook


So, you’ve spent a lot of time and effort in creating your python machine learning model.  The parameters have been tweaked and the metrics look great.

Now what?  How do you share it with others to use?  Well, one easy way it to pickle it.  The pickle library in python allows you to write your model as a file, that others can open.  They can then simply enter their own data for prediction.

In this YouTube tutorial I create a random forest regressor model, export it as a pickle file, and then import it for use.  Have a look at how easy it all is.

K means clustering using python

K means clustering using python

The scikit learn library for python is a powerful machine learning tool.
K means clustering, which is easily implemented in python, uses geometric distance to create centroids around which our data can fit as clusters.
In the example attached to this article, I view 99 hypothetical patients that are prompted to sync their smart watch healthcare app data with a research team. The data is recorded continuously, but to comply with healthcare regulations, they have to actively synchronize the data. This example works equally well is we consider 99 hypothetical customers responding to a marketing campaign.
In order to prompt them, several reminder campaigns are run each year. In total there are 32 campaigns. Each campaign consists only of one of the following reminders: e-mail, short-message-service, online message, telephone call, pamphlet, or a letter. A record is kept of when they sync their data, as a marker of response to the campaign.
Our goal is to cluster the patients so that we can learn which campaign type they respond to. This can be used to tailor their reminders for the next year.
In the attached video, I show you just how easy this is to accomplish in python. I use the python kernel in a Jupyter notebook. There will also a mention of dimensionality reduction using principal component separation, also done using scikit learn. This is done so that we can view the data as a scatter plot using the plotly library.

You can view the video here.