Sharing your machine learning models with others

So, you’ve spent a lot of time and effort in creating your python machine learning model.  The parameters have been tweaked and the metrics look great.

Now what?  How do you share it with other to use?  Well, one easy way it to pickle it.  The pickle library in python allows you to write your model as a

Pickle your model.

file, that others can open.  They can then simply enter their own data for prediction.

In this YouTube tutorial I create a random forest regressor model, export it as a pickle file, and then import it for use.  Have a look at how easy it all is.

K means clustering using python

The scikit learn library for python is a powerful machine learning tool.

K means clustering, which is easily implemented in python, uses geometric distance to create centroids around which our data can fit as clusters.

In the example attached to this article, I view 99 hypothetical patients that are prompted to sync their smart watch healthcare app data with a research team. The data is recorded continuously, but to comply with healthcare regulations, they have to actively synchronize the data.  This example works equally well is we consider 99 hypothetical customers responding to a marketing campaign.

In order to prompt them, several reminder campaigns are run each year. In total there are 32 campaigns. Each campaign consists only of one of the following reminders: e-mail, short-message-service, online message, telephone call, pamphlet, or a letter. A record is kept of when they sync their data, as a marker of response to the campaign.

Our goal is to cluster the patients so that we can learn which campaign type they respond to. This can be used to tailor their reminders for the next year.

In the attached video, I show you just how easy this is to accomplish in python. I use the python kernel in a Jupyter notebook. There will also a mention of dimensionality reduction using principal component separation, also done using scikit learn. This is done so that we can view the data as a scatter plot using the plotly library.

Video

K-means clustering

Teaching statistics and data science in medical school

Teaching statistics and data science in medical school

Understanding statistical analysis and interpreting the results of research papers are just as important as the ability to correctly diagnose the cause of acute abdominal pain.

Medical knowledge is expanding at a rapid pace. This is evident by the number of research papers being published every year. Although medical students and residents attend a formal education program, it is journal papers that serve as masters of education for the majority of a professional’s life.

The ability to understand the results section of a paper is crucial in deciding to change clinical practice. In order to do this effectively, knowledge of statistics is vital.

Yet, formal training is statistics takes a back seat when it comes to anatomy, physiology, and, clinical teaching. When statitics is part of the curriculum, it is often positioned as less important. It gets even worse when taught with mathematical emphasis. Whilst it may be rigorous to teach using equations, a subset of medical students are lost in this effort.

No medical school can look the other way. Data analysis and computational thinking is part of the future of healthcare. I was reminded of this when I came across this article again, after reading it almost two years ago: NYU medical students learning to analyze big data.

Our efforts at University of Cape Town are growing too. The massive open online course: Understanding clinical research on the Coursera platform, has now had more than 23,000 participants. In the division of General Surgery, I teach the use of data analysis and computational thinking to great effect, using IBM SPSS, Python, Julia, and Mathematica.

It’s time data science and statistical analysis to take its rightful place in medical school curricula.

Julia for scientific computing, my second Coursera MOOC

October 2016 has seen the launch of my second course on the Coursera massive open online course (MOOC) platform.  Whereas my first course dealt with the statistics used in healthcare research, this one teaches the new Julia language for scientific computing.  You can find it here.

As with other Coursera offerings, you can pay a nominal fee to get a verified certificate from the University of Cape Town, else you can audit the course for free.  Remember, though, that it is always possible to state that you do not have the financial resources to pay for the verified certificate and Coursera will waive the fee and you will still get your certificate.

You can also learn more about Julia from their home page.  Let me know what you think.

Before I forget, the Jupyter notebooks for the course are available on GitHub.

 

Julia logo

Calculus in the plane

Just to show off what Jupyter notebooks can do, this post will render part 1 of lesson 1 of my lecture series on complex variables. Have a look.