The scikit learn library for python is a powerful machine learning tool.

K means clustering, which is easily implemented in python, uses geometric distance to create centroids around which our data can fit as clusters.

In the example attached to this article, I view 99 hypothetical patients that are prompted to sync their smart watch healthcare app data with a research team. The data is recorded continuously, but to comply with healthcare regulations, they have to actively synchronize the data.  This example works equally well is we consider 99 hypothetical customers responding to a marketing campaign.

In order to prompt them, several reminder campaigns are run each year. In total there are 32 campaigns. Each campaign consists only of one of the following reminders: e-mail, short-message-service, online message, telephone call, pamphlet, or a letter. A record is kept of when they sync their data, as a marker of response to the campaign.

Our goal is to cluster the patients so that we can learn which campaign type they respond to. This can be used to tailor their reminders for the next year.

In the attached video, I show you just how easy this is to accomplish in python. I use the python kernel in a Jupyter notebook. There will also a mention of dimensionality reduction using principal component separation, also done using scikit learn. This is done so that we can view the data as a scatter plot using the plotly library.

Video

K-means clustering

Share This