News

4/28 | Project begins


Contents

Week 1 - Week 2 - Week 3 - Week 4 - Week 5 - Week 6 - Week 7 - Week 8 - Week 9 - Weeks 10 and 11 - Week 12 - Week 13 - Week 14 - Weeks 15 and 16

Weeks 15 and 16: August 18th - August 29th

These last two weeks were spent finishing up my report. I spent a lot of time writing, and then submitted a copy of the report to Joelle. After making a few changes she suggested, my report is finished! I learned a lot this summer, and am very grateful to Joelle, the CDMP, and NSERC for this experience.


Week 14: August 11th - August 15th

Given that no approaches seem particularly promising, I mainly spent this week documenting all the work I've done. I spent a lot of time re-running programs to get more specific results and graphs. This week taught me the most important lesson I've learned so far: it's much easier to document your work while you're doing it than it is several weeks later.


Week 13: August 4th - August 8th

I tried to improve the results from the sets of trees that I built last week, by weighting them with some more sophisticated machine learning algorithms. Unfortunately, even with this adjustment the results are poor. The algorithm isn't even able to classify the training set, which means that the outputs of the trees do not contain enough information for predictions to be made.

I also followed up on the cross-validation trees. It turns out that no matter what parameters I use, the trees predict everything as interictal. That left me with no choice but to reduce the class imbalance: I threw out a bunch of interictal data until my data was about 30 percent preictal. Once I did that, the trees no longer classified everything as interictal. However, their performance was no better than a random classifier. Oh well...


Week 12: July 28th - August 1st

This week I tried out my idea of trying to build a set of decision trees for each patient, and using this system to try to predict seizures for a new patient. I got the program working, and the results don't seem to be any better than the previous methods we've tried. I might try to improve this by changing how a new patient learns from previous patients. Right now I'm just using linear regression to do this learning, which is fairly limited, so instead I may try with the more complex algorithms provided by the Weka software.

I also worked on doing the cross-validation to pick good settings for the trees. To do this, I need to build trees with different settings and compare them. I'm experimenting with 200 different settings, for 21 different patients, so the trees are taking a few days to build and haven't been finished yet.

Weeks 10 and 11: July 14th - July 25th

I spent the past couple weeks learning enough C++ to write this tree algorithm. It finally runs, which is very satisfying. It still needs work, though: when it classifies time windows, it predicts that everything is interictal. It does this because there is a class imbalance in the data: most windows are interictal, so predicting it everywhere gives fairly good accuracy. It's not useful in any way, though, so I'm going to try and fix this. I could eliminate the imbalance by throwing out interictal data, but I'm reluctant to get rid of any information. Instead, I'm going to try to change the parameters of the trees. I think part of the problem is that when the trees learn from 20 different patients, there's so much information that they grow very complex, and then this complexity doesn't extend to a new patient. By setting aside some seizures, I can use these for cross-validation, to pick better settings for the trees (such as how many there are, how deep they grow, and how many instances form a leaf).

I'm also going to try implementing a new idea I had, which is to impose some structure on the trees. Instead of just building trees from a huge mess of data, I'll build one for each patient. Then for any new patient, we can try to see which of the previously learned patients he is most similar to, and use only those trees for prediction.

Week 9: July 7th - July 11th

This wasn't a particularly productive week. I've been looking around for code to help me build the tree algorithm, and I found that Arthur, another student in the lab, had written a program that would be really helpful. However, it's in C++, which I have no experience with, so I've largely spent my time trying to learn the basics and figure out how to make the changes I need. I'm especially looking into MEX-files, which will let me use his C programs directly from my Matlab code. I also spent many, many mind-numbing hours helping Keith label seizures in recordings from rat brain slices. Hopefully, that data will help their side of the project, which is to develop patterns of brain stimulation that can reduce seizure occurrence.

Week 8: June 30th - July 4th

This week I kept working on building a tree that learns from all patients and then applies this knowledge to a single patient. It turns out that it's a little more difficult to implement than I had hoped, so I'm still working on it.

In the meantime, I wrote several little programs to help me generate results automatically. I did this partly with Matlab, but I also started learning Python to help me speed things up. All that code is now written, and I used it to look at all the patients who experienced at least five seizures. I reduced the amount of data using PCA (see week 6), and I tried using PCA both across all patients as well as for each patient independently. The results were not significantly above chance, and the type of PCA didn't seem to influence anything, so I think it's time to move on and focus on other approaches, such as the tree algorithm I'm trying to write.

Week 7: June 22nd - June 27th

So far I've spent a lot of time on two specific patients, one with bad results and one with good. This really isn't enough to measure performance: it's hard to tell if our methods are working, and one patient just happens to be difficult, or whether our method is not very effective and we happened to get lucky with one patient.

Now I'm going to try to test everything on the large, 21 person database. The problem with this database is that many of the patients don't have many seizures - some have as few as two. Normally we try to predict a patient's new seizures after looking at previous seizures, but when there are only two this isn't enough information.

Our new approach will be to try to consider all patients at the same time, so that we have more seizures to use as background information. However, we don't actually want to train our algorithms on other patients' data, since it's too different from whichever patient we want to look at. This week we began examining methods that might allow us to learn from other patients while still focusing on a specific patient for prediction.

The general idea is to learn a model of the problem from all the patients, and then use a specific patient's information to actually make decisions. I read a couple papers to try and find out more about this: 'On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes' (Ng et al.) and 'Semi-supervised learning with trees' (Kemp et al.).

I spent the week trying to set up a system that looks at all 21 patients to build a model of the problem. It does this by determining the structure of a decision tree (for more about decision trees go here). It then turns to a single patient to learn what values it should predict at any given node of the tree. I'm still working on this and hopefully I'll have some results next week.

Week 6: June 2nd - June 6th

This week I decided to try to reduce the amount of information that I examine from the data. Previously, I was taking 680 features from each data point, and that was a huge amount for the learning algorithms to deal with. This week, I began using principal components analysis (PCA). This method looks at the features and creates a new set of features that capture the data more efficiently. I managed to reduce the number of features to 10, and then the learning algorithms gave much better results. They showed a clear change approximately 30 seconds before seizure onset. This was very encouraging, because it indicated that there is a discernible change in the data before a seizure begins. I returned to doing classification, to see if PCA would also be helpful for that task. It does seem to improve results somewhat for one patient, whose seizures seem to be very clearly predictable. However, there was no improvement for a second patient, whose seizures were more difficult to predict. One interesting issue might be to try to look into why one patient yields such different results from the next - what is it about his seizures that is predictable, but absent in the other patient?

Week 5: May 26th - May 30th

I spent this week mainly on trying to improve the regression results. We thought that one possible problem was that the time window we're looking at is too large. I was using regression to look at the 30 minutes before a seizure and try to predict the time left. There might not really be a difference between, say, 20 minutes versus 30 minutes before a seizure, so this might not be a reasonable goal. Instead, I tried looking at smaller time windows, 5 minutes and 2 minutes. I also became more systematic about looking at the results, and made sure to combine results from several seizures instead of just looking at one. I tried several different learning algorithms, and two of them seemed to show some ability to predict the time left. Next week I'll try to improve these results by narrowing down which features of the data are important, and also using filtered data instead of the noisy signals I've been using so far.

Week 4: May 20th - May 23rd

This week I continued by working on patients from the 21-person database. The approaches that I used earlier gave worse results with these patients. This is probably due to a couple difference in this database. First, these patients have recordings from only 6 electrodes, while the others had 20 or more, so there's much less information to work with. Also, the recordings aren't continuous: they've been chopped up and mixed around and there are several gaps. I was a little sloppy in dealing with this, so I'm going to go back and try to determine a better approach. I'm also thinking about new features to look at in the data, to make up for the fact that there are so few channels.

I also spent a lot of time trying to get rid of the noise in the data. The information in the recordings is overwhelmed by the noise from electrical sources and movement. I tried a couple different things to reduce this problem: I started subtracting the average of all channels from each channel at every time point, to eliminate artefacts that were common to all channels. I also added a function to normalize each channel before extracting its features.

The added processing seemed to improve results for some patients but worsen them for others. It wasn't clear whether there was a significant difference overall. I would have more certainty about my results if I could test things several times instead of just once, so next I plan to write some scripts to automate the way I use Weka to compute results.

I also started doing regression instead of classification: instead of trying to say whether a window of time was preictal or not, I tried to take a window and predict the time left until the next seizure. This really didn't work very well, I'm not sure what to do about that so I plan to discuss it with Joelle.

Go to top

Week 3: May 12th - May 16th

This week I worked on using the same techniques on the data for another patient. The results turned out to be much better for this patient, so it's good to know that the machine learning methods can generalize. I'm going to continue by checking the results for a much larger database, of 21 patients.

I also spent some time working with Keith to develop tools that will allow us to test methods much more efficiently. Keith has now written programs to extract the data from its raw form, and I wrote some functions that help me turn the features of the data into forms that are accepted by Weka, the software that we're using.

Although I've already had a little experience with Weka, I didn't know much about its many different options, so I spent some time reading up on it. I read Chapter 8 of the book Data Mining: Practical Machine Learning Tools and Techniques with Java implementations, by Ian H. Witten and Eibe Frank.

In addition to this work, I also attended a couple lab meetings. In one we discussed the paper 'Responsive Cortical Stimulation for the Treatment of Epilepsy', by Sun et al., and in a general lab meeting we listened to two talks on Bayesian reinforcement learning.

Go to top

Week 2: May 5th - May 9th

We began the week by meeting to talk about the Gardner et al. paper. They used an interesting method to detect seizures, but a closer look showed that their results weren't great, so we decided not to try their approach for now.

I also looked at the code for the Random Forests, and found some differences between it and a similar method that another student, Arthur, used successfully. He's going to try to make some changes to the code and try out the similar algorithm, extremely randomized trees.

I also tried running some of the same algorithms on a version of the data which was extremely undersampled, so that there were as many seizure time windows as non-seizure. In this case, the Random Forests do not become worse with extra features, so it's possible that the class imbalance has something to do with the decrease in performance.

Finally, I spent a lot of time trying to get the EEG data for different patients to see if these results generalize. After many hours spent dealing with huge, corrupted files, I now have 20 hours of data for another patient, and next week I will try the same approaches on the new dataset.

Go to top

Week 1: April 28th - May 2nd

I started off by meeting with Joelle to talk about possible directions for the project. Right now there are several different possible options, so it's hard to pick exactly what to do next. We have preliminary results from a single patient, doing prediction with several different methods. One thing we've noticed is that the Random Forests algorithm performs worse as more features are added. This is unexpected, so I'm going to look into it further.

I also read a couple papers that might be useful for the project. One was "One-class novelty detection for seizure analysis from intracranial EEG', by Gardner et al. The other was "An introduction to adaptive seizure suppression", by Keith Bush, a post-doc in the lab.

Also, I learned a little about how to make websites, and then made this one.

Go to top