Sharing your machine learning models with others

So, you’ve spent a lot of time and effort in creating your python machine learning model.  The parameters have been tweaked and the metrics look great.

Now what?  How do you share it with other to use?  Well, one easy way it to pickle it.  The pickle library in python allows you to write your model as a

Pickle your model.

file, that others can open.  They can then simply enter their own data for prediction.

In this YouTube tutorial I create a random forest regressor model, export it as a pickle file, and then import it for use.  Have a look at how easy it all is.

K means clustering using python

The scikit learn library for python is a powerful machine learning tool.

K means clustering, which is easily implemented in python, uses geometric distance to create centroids around which our data can fit as clusters.

In the example attached to this article, I view 99 hypothetical patients that are prompted to sync their smart watch healthcare app data with a research team. The data is recorded continuously, but to comply with healthcare regulations, they have to actively synchronize the data.  This example works equally well is we consider 99 hypothetical customers responding to a marketing campaign.

In order to prompt them, several reminder campaigns are run each year. In total there are 32 campaigns. Each campaign consists only of one of the following reminders: e-mail, short-message-service, online message, telephone call, pamphlet, or a letter. A record is kept of when they sync their data, as a marker of response to the campaign.

Our goal is to cluster the patients so that we can learn which campaign type they respond to. This can be used to tailor their reminders for the next year.

In the attached video, I show you just how easy this is to accomplish in python. I use the python kernel in a Jupyter notebook. There will also a mention of dimensionality reduction using principal component separation, also done using scikit learn. This is done so that we can view the data as a scatter plot using the plotly library.

Video

K-means clustering

Julia for scientific computing, my second Coursera MOOC

October 2016 has seen the launch of my second course on the Coursera massive open online course (MOOC) platform.  Whereas my first course dealt with the statistics used in healthcare research, this one teaches the new Julia language for scientific computing.  You can find it here.

As with other Coursera offerings, you can pay a nominal fee to get a verified certificate from the University of Cape Town, else you can audit the course for free.  Remember, though, that it is always possible to state that you do not have the financial resources to pay for the verified certificate and Coursera will waive the fee and you will still get your certificate.

You can also learn more about Julia from their home page.  Let me know what you think.

Before I forget, the Jupyter notebooks for the course are available on GitHub.

 

Julia logo

Our road to patient-centred, competency-based education

So, how can an academic surgical unit benefit from the computer code development skills of people such as Wes McKinney of pandas fame or the educational skills of an engineering professor such as Lorena Barba of Numerical MOOC (numerical massive open online course) fame? Answer: A lot. This post is about our efforts to transition from antiquated to more modern forms of surgical training and assessment, all with the help of the one of the best software projects out there, Project Jupyter. This is Groote Schuur after all!

The teaching and assessment paradigm has stood for many, many decades. Do four years of surgical rotations, watch what your superiors do, present on ward rounds, go to the clinic, take calls, assist in theatre, do some cases, attend (most) academic meetings (read: watch yet another PowerPoint presentation), pass three exams. Presto. Specialist. That’s how its done now, that how is was done in the 00’s, the 1990’s, 80’s, 70’s, 60’s, 50’s, 40’s,… You get the point. Hey, depending on which source you read, it was in the the 40’s that the overhead projector was first used by the military in World War II. If you think about it, an overhead transparency projector is just PowerPoint without a computer. If you slipped in one transparency while the other is still showing, it;s just like a slide transition!

Depending on your working environment, you might be surrounded by people in full support of this form of education. It has always worked that way. Why change now? Well, as the argument goes, by that logic bloodletting should still be all the rage. You will note that in contrast to medical education, actual medicine has come on in leaps and bounds. We buy into the new paradigm that is evidence-based medicine. So why is it so difficult to accept and, even more difficult, to practice evidence based medical education?

Some of us are fortunate enough to work in countries where there are national efforts and frameworks in place to motivate for change. Have a look at the CanMEDS program in Canada. Two of the key concepts in their program are patient-centred care and competency-based assessment. Without going into the detail of their programs, I want to concentrate on these two aspects. Reason being, it gives us a practical starting point. For those unfortunate enough not to work in countries with national frameworks and support, small steps have to be taken.

So what solutions have we implemented in the Acute Care Surgery Unit at Groote Schuur Hospital? First and foremost, involve the patients. They are at the centre of what we do after all. Why should they have no say in the evaluation of their care? Fortunately, validated tools are available when you turn to the literature. At this time we use the Jefferson scale of patient’s perception of physician empathy. Moving on to competency assessment, there is the Ward Round Assessment tool amongst many others. Point being, we are moving away from the 20-second, mark either average or above average on the end-of-rotation subjective question scorecard. You know the one: (1) Knowledge, (2) Surgical skill, (3) Punctuality…

Now, the Acute Care Surgery Unit is brand new (you can learn more about us from my talk at this year’s Association of Surgeons of South Africa conference here). We certainly have no research assistants, money, or personnel to help us in our efforts towards patient-centred, competency-based education. This whole process has to be self-driven. Solutions to the problem? Well, that’s the easy bit. The World has changed over the last few years. No longer is knowledge locked away behind expensive paywalls. If you want to learn something, go online. For me, it all started with the Massachusetts Institute of Technology (MIT). Their open courseware opened a whole new world to me. MIT and the massive open online course platforms such as Coursera (to which I will shortly contribute), EdX and FutureLearn (to name but a few) are handing the keys of knowledge to all humankind.

This brings me to Project Jupyter and computer languages such as IPython and Julia. If you have no access to software development teams and big budget research units, do yourself a favor, search for tutorials on these projects. You will find so many wonderful men and women, going out of their way to empower you with these tools. Even a lowly surgeon such as myself have online tutorials. Have a look at these:
The Klopper Lectures on Julia
Mini project: Medical research using Julia

Back to what this post is all about.  Here, you will find a link to some of our results using Project Jupiter (Github). To protect patients and trainees, the data have been altered and are not a true reflection of anyone or any given period. What it does show, though, is how easy it is to use data to properly guide the training of our residents; and this is just our first small step.

The Julia programming language

So, I’ve started a new playlist on my YouTube® channel called The Julia Computer Language.  For now, lessons 1 and 2 are up and as (limited) time allows, I’ll add some more.

Julia is a rather new programming language for technical or scientific computing.  You will find out a lot more about it on the Julia homepage.  Unfortunately, there is not a lot of tutorials on Julia out there and if you do find them, most are by computer scientist for computer scientists.  Perhaps rightly so, as Julia is a fantastic tool, capable of some pretty impressive things when it comes to scientific computing.  It prides itself on being as simple and easy to use as Python, with speeds approaching that of C or Fortran.  It is indeed much speedier than other mathematical languages such as Matlab® and Mathematica®.

On top of this, I believe that it makes for an excellent language for a novice starting off, learning how to code.  This is especially true for those who plan to go into the fields of science and technology.  Even if you move on to other languages, Julia will stand you in good stead.  It might spoil you, though, which means you’ll come running straight back to it.

I do stick to IPython for my medical statistics, but Julia works perfectly here too.  I’ve made a lecture on the topic, which you can view here.

Go on, give Julia a spin.  There is just something about it that speaks to me.  A certain elegance and power.  Well done to the brilliant minds that came up with it and to all those who are continuing its development.

You can write Julia code in the cloud using JuliaBox, so no need to install anything at all.  At this time, I am having tremendous problems getting it (IJulia) to run in Jupyter, so much so that I am using the very nice Juno development environment.  In upcoming lessons I will look at installing Julia, Jupyter, and Juno, but for now, you can follow along without any downloads or installs.  Just use JuliaBox and your Google® account to sign in.  The notebook files that I use are in a zip file on this page.