Life of: A Chief Data Scientist

Name: Derek Lin

Job title: Data scientist, Exabeam

Bio: Derek Lin is a seasoned data scientist passionate in the art of building data-driven defence against cyber threats and fraud. His current and prior machine learning research experience includes behavior-based security analytics such as malware detection and insider threat detection, risk-based online banking fraud detection, data loss prevention, voice biometrics security, and speech and language processing. He holds numerous patents and has published multiple research papers. 

What was your route into cybersecurity?

With an electrical engineering degree, I started my career in speech and language processing research, which gave me solid grounding in data analytics. Years later, by chance, I landed a job in online banking fraud detection. I found the idea of detecting security events fascinating. From there, it was a natural transition to expand my research work in the field of cybersecurity.

Tell me in one sentence what your job is about 

My job is about helping enterprises finding the bad guys using a data-driven approach by pro-actively monitoring and analyzing enterprise logs.

What’s the best thing about your job?

There is a clear sense of purpose in doing what I do in cybersecurity: finding the bad guys. What better way is there to feel good about oneself than doing work to right the wrongs? Nothing beats the satisfaction of catching a real-world security event with a data science module I just built!

Derek Lin, Exabeam
Derek Lin, Exabeam

And what’s the worst?

Well, when that data science module turned out not to work well in the real world.

What’s the most misunderstood thing about information security?

Currently, the talk of data science is all the rage in security analytics. I think there is some perception that data science or machine learning can magically address many security problems. Unfortunately, data science alone is not a magic bullet; security expertise and processes remain indispensible in information security.

What’s your biggest professional regret? 

I did not realize the value of having a good mentor in my early career. From choosing the right company to work for, to avoiding the wrong one, there were missed learning opportunities along the way.

What are you most proud of?

A few years ago, my team used machine learning algorithms to detect security events for a client. It turned out what we detected was malware-infected medical devices beaconing home and sending data out. This was a major discovery and a couple of months later, the Wall Street Journal ran a front-page article on a similar security breach involving medical devices at a major healthcare provider. It was a vindication of our work. I couldn’t be more proud of that.

If you could change one thing about the information security sector, what would it be?

Marketing brochures and slides from security vendors that have tantalising claims, but are often confusing to would-be buyers. I am most interested in vendors’ detection accuracy characteristics, as that is ultimately what a security product is for. To provide transparency and to encourage innovation, I do sometimes wish that there was a standards body, which could periodically provide test data for peer evaluation and benchmarking. Academic communities have used standardized tests to great success in the image and speech recognition space, so why not in cybersecurity?

Nothing beats the satisfaction of catching a real-world security event with a data science module I just built...Derek Lin

Tell me about a time you screwed up

Making a successful machine learning system starts by understanding your user’s needs, so that you can gauge the feasibility of building the system in the first place. A few years ago, I took on a project for a classification problem and later found out that the end user had insisted on having a zero false positive rate. Anyone with even basic data science exposure would know there is always a healthy trade-off between the detection rate and the false positive rate in a classification system. It is not practical to demand a zero false positive rate, which amounts to basically not making any classification attempt at all. In any case, the deployed model, with non-zero false positive rate, gave such a bad rap that since then, I’ve learned to very carefully question end users’ assumptions and requirements before embarking on a project.

What advice would you give to someone starting out in the information security industry in 2017? 

The Big Data era has already arrived. Security and data analytics are converging. As a junior security analyst, immerse yourself in as much data science as you can. The effort will pay off!

Who do you really admire in the industry? 

On the security side, who wouldn’t first think of the three names behind RSA: Ron Rivest, Adi Shamir, and Leonard Adleman, who developed the RSA encryption algorithm?

On the data science side, I’d pick Andrew Ng, a computer scientist, for his ability to explain deep mathematical algorithms so well. He also co-founded Coursera, which made data science accessible to many.

What alternative career path would you have liked to persue?

I’d like to do research work in the area of understanding the process of human language acquisition, drawing ideas and lessons from physical science into computer science. 

What’s Hot on Infosecurity Magazine?