I attended the keynote at IBM’s CASCON conference in Toronto today, where Judy Huber, who directs the IBM Canada software lab, kicked off the session by reminding us that IBM software development has been happening in Canada since 1967 and continues to grow, and of the importance of collaboration between the research and industry communities. She introduced Joanna Ng, who is the head of research at the lab, to congratulate the winners of the most influential paper from CASCON 2001 (that date is not a typo, it’s a 10-year thing): Svetlana Kiritchenko and Stan Matwin for “Classification with Co-Training” (on email classification).
The main speaker was Sal Vella, VP of architecture and technology within the IBM software group, talking about technologies to build solutions for a smarter planet. Fresh from the IOD conference two weeks ago, I was all primed for this; there was a great booth at IOD that highlighted “smarter technology” with some interesting case studies. IBM’s smarter planet initiative is about technologies that allow us to do things that we were never able to do before, much of which is based on the immeasurable volume of data constantly produced by people, devices and systems. Consider electricity meters, like the one that you have in your home: it used to be that these were read once per month (if you were lucky) by a human, and the results entered into a billing system. Now, smart meters are read every 15 minutes to allow for time of use billing that rewards people for shifting their electricity usage away from peak periods. Analytics are being used in ways that they were never used before, and he discussed the popular Moneyball case of building a sports team based on player statistics. He also spoke about an even better use of analytics to create “solutions for a partying planet”: a drinks supplier predicting sports games outcomes to ensure that the pubs frequented by the fans of the teams most like to win had enough alcohol on hand to cover the ensuing parties. Now that’s technology used for the greater good.
There are a lot of examples of big data and analytics that were previously unmanageable that are now becoming reasonable targets, most of which could be considered event-based: device instrumentation, weather data, social media, credit card transactions, crime statistics, traffic data and more. There are also some interesting problems in determining identity and relationships: figuring out who people really are even when they use different versions of their name, and who they are connected to in a variety of different ways that might indicate potential for fraud or other misrepresentation. Scary and big-brotherish to some, but undeniably providing organizations (including governments) with deeper insights into their customers and constituents. If those who complain about governments using this sort of technology “against” them would learn how to use it themselves, the tables might be turned as we gain insight into how well government is providing services to us.
We heard briefly from Charles Gauthier, acting director at the institute for information technology at National Research Council (NRC) Canada. NRC helped to create the CASCON conference 21 years ago, and continue to sponsor it; they support research in a number of areas that overlap with CAS and the other researchers and exhibitors presenting here.
The program chairs, Marin Litoiu of York University and Eleni Stroulia of University of Alberta presented awards for the two outstanding papers from the 22 papers at the conference:
- “Enhancing Applications Robustness in Cloud Data Centres” by Madalin Mihailescu, Andres Rodriguez and Cristiana Amza of University of Toronto, and Dmitrijs Palcikovs, Gabriel Iszlai, Andrew Trossman and Joanna Ng of IBM Canada
- “Parallel Data Cubes on Multi-Core Processors with Multiple Disks” for best student paper, by Hamidreza Zaboli and Frank Dehne of Carlton University
We finished with a presentation by Stan Matwin of University of Ottawa, co-author of the most influential paper presentation on email classification from the CASCON of 10 years past (his co-author is due to give birth on Wednesday, so decided not to attend). It was an interesting look at how the issue of email classification has continued to grow in the past 10 years; systems have become smarter since then, and we have automated spam filtering as well as systems for suggesting actions to take (or even taking actions without human input) on a specific message. The email classification that they discussed in their paper was based on classification systems where multiple training sets were used in concert to provide an overall classification for email messages. For example, two messages might both use the word “meeting” and a specific time in the subject line, but one might include a conference room reference in the body while the other references the local pub. Now, I often have business meetings in the pub, but I understand that many people do not, so I can see the value of such a co-training method. In 2001, they came to the conclusion that co-training can be useful, but is quite sensitive to its parameters and the learning algorithms used. Email classification has progressed since then: Bayesian (and other) classifiers have improved drastically, data representation is richer (through the use of meta formats and domain-specific enrichment) to allow for easier classification. social network and other information can be correlated, and there are specific tailored solutions for some email classification applications such as legal discovery. Interesting to see this sort of perspective on a landmark paper in the field of email classification.
I’m not sticking around for any of the paper presentations, since the ones later today are a bit out of my area of interest, and I’m booked the rest of the week on other work. However, I have the proceedings so will have a chance to look over the papers.