I received my BS and MS in Computer Science from Stanford University in 2013, where I
worked with Andrew Ng and Daphne Koller in the Stanford AI Lab.
In 2012, I joined Coursera as its third employee. I served as
Director of Partnerships and Course Operations for two years, during which I built a team of 25
people working with thousands of instructors and staff from 100+ schools, and
then as the product manager in charge of university-facing products.
I returned to Stanford in 2015, working for a year with Anshul Kundaje on
computational biology. In 2016, I started my PhD in Computer Science at Stanford.
I come from sunny Singapore. I spent two years as an armored
infantry officer in the Singapore Armed Forces and a month as a safari guide trainee
in South Africa.
I play the piano and the computer, and I
also spent a quarter playing, badly and loudly, the carillon on Hoover Tower.
For more information on any project, please click on its title. * = equal contribution.
Understanding black-box predictions via influence functions
Koh PW and Liang P. arXiv
How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, identifying the points most responsible for a given prediction. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for many different purposes: to understand model behavior, debug models and detect dataset errors, and even identify and exploit vulnerabilities to adversarial training-set attacks.
A roadmap for early human liver development from pluripotent stem cells
Ang LT, Tan KYA, Goh SHJ, Choo SH, ..., Koh PW, Weissman IL, Chen QF, Loh KM, Lim B. Under review.
We describe an efficient protocol for differentiating human ESCs into a relatively homogeneous population of
hepatocytes and demonstrate that these cells can engraft damaged mice livers, improving overall survival.
Localized hepatic lobular regeneration by central-vein-associated lineage-restricted progenitors
Tsai JM, Koh PW, Walmsley GG, Poux N, Weissman IL, Rinkevich Y. Proceedings of the National Academy of Sciences
When an adult mammalian liver undergoes acute tissue loss (e.g., injury to a lobe through partial hepatectomy), the remaining liver cells undergo a program of expansion and cell division that recovers organ mass but leaves liver morphology and architecture permanently altered. Here, we identify a specific time window after birth where similar injury results instead in regeneration that results in the injured lobe being indistinguishable from normal ones. We study this previously-unknown program of liver regeneration, using clonal analysis to track the fate of hepatocyte progenitors at the injured sites. These results hint at a therapeutic window in which specific cells can undergo clonal expansion to give rise to normal structure and function in the face of injury.
An atlas of transcriptional, chromatin accessibility, and surface marker changes in human mesoderm development
Koh PW*, Sinha R*, Barkal A, Morganti R, Chen A, Weissman I, Ang LT, Kundaje A, Loh K. Scientific Data
We study the dynamics of translation, chromatin accessibility, and surface markers as pluripotent stem cells differentiate
through mesoderm intermediates into bone, heart, and other cell types. Using the mesoderm populations described in our related
Cell paper, we run bulk-population RNA-seq, single-cell RNA-seq, ATAC-seq, and high-throughput surface marker screening
to characterize changes across differentiation. In contrast to the biology-focused Cell paper, this paper focuses on the aspects of data processing, quality control, and computational analysis.
A comprehensive roadmap from pluripotency to human bone, heart and other mesoderm cell types
Loh KM*, Chen A*, Koh PW, Deng TZ, Sinha R, ..., Kundaje A, Talbot WS, Beachy PA, Ang LT, Weissman IL. Cell
We chart a developmental roadmap that allows us to differentiate pluripotent stem cells into twelve mesodermal lineages, including bone, muscle, and heart. We use this differentiation system to produce pure populations of human bone and heart progenitors that successfully engraft in in vivo mouse models. Our system also allows us to study previously-unobservable events in human embryonic development; using single-cell RNA-seq, we discovered a new genetic marker of somite segmentation.
Denoising genome-wide histone ChIP-seq with convolutional neural networks
Koh PW*, Pierson E*, Kundaje A. ICML 2016 Workshop on Computational Biology
, Best Poster Award
). To appear at ISMB 2017
and in Bioinformatics
Biological data is often extremely noisy. Can we make use of structure in the data to remove some of the noise? In this work, we focus on
chromatin immunoprecipitation sequencing (ChIP-seq) experiments targeting histone modifications and show that a convolutional neural network trained on matching pairs of noisy and high-quality data can signifcantly improve data quality. This approach is generally applicable to biological problems where it is relatively easy to generate noisy versions of high-quality data, but difficult to analytically characterize the noise or underlying data distributions.
Identifying genetic drivers of cancer morphology
Koh PW, Beck A, and Koller D. Undergraduate honors thesis; published in Undergraduate Awards Library 2012
Awarded the Firestone Medal for Excellence in Research
the Ben Wegbreit Prize for Best Undergraduate Honors Thesis in CS
the David M. Kennedy Honors Thesis Prize
for best thesis across Stanford engineering and applied sciences, and the 2012 Undergraduate Award in Computer Science and Information Technology
, an international research award.
Cancer cells have both abnormal morphology and anomalous gene expression.
How are morphology and gene expression linked?
To answer this, we extracted clinically-relevant features
from tumor micrographs, and then developed new multi-task regression methods to
associate these image features with gene expression.
We used this method to study data from hundreds of breast cancer patients, deriving testable hypotheses
about the effect of specific genes on tumor morphology.
Peer and self assessment in massive online classes
Kulkarni C, Koh PW, Le H, Chia D, Papadopoulos K, Koller D, Klemmer S.
ACM Transactions on Computer-Human Interaction
2013 and Design Thinking Research
Can we use peer- and self-assessment in MOOCs to scale up assessment and learning
in global classrooms?
We analyzed data from the first MOOC to use peer- and self-assessment and showed
that these forms of assessment are effective and scalable, with
peer grades correlating highly with staff grades. We also experimented with
giving graders automatic feedback and using data to design better rubrics, further increasing
Dissecting an online intervention for cancer survivors
Chen Z, Koh PW, Ritter PL, Lorig K, Bantum E, Saria S. Health Education & Behavior
The debilitating effects of cancer can last long after initial treatment, even
if the cancer is in remission. To cope with this, cancer survivors have increasingly turned towards online
peer support groups. Using data from these groups, we studied
how online participation affects downstream health outcomes, with an eye towards
being able to better design such peer support groups.
Ngiam J, Koh PW, Chen Z, Bhaskar S, Ng AY. NIPS
Many existing algorithms for unsupervised feature learning require either extensive parameter tuning
or are unable to scale to large input sizes.
Here, we introduced sparse filtering, a simple new feature learning method that scales gracefully and
has only one parameter to tune.
Learning deep energy models
Ngiam J, Chen Z, Koh PW, Ng AY. ICML
We introduced deep energy models, a type of deep generative model which uses several layers of feedforward
functions to model the probability distribution of data.
Our model admits efficient inference and obtains good generative and classification
performance on natural and synthetic image data.
On random weights and unsupervised feature learning
Saxe A, Koh PW, Chen Z, Bhand M, Suresh B, Ng AY.
ICML 2011. Previously appeared in the Workshop on Deep Learning and Unsupervised Feature Learning, NIPS
Some feature learning architectures do well on object recognition tasks
even when their feature weights are totally untrained and randomized. Why can random weights do so well?
We show that certain architectures can be inherently frequency selective and translation invariant, even with random weights.
Indeed, a lot of the performance of certain state-of-the-art methods comes from the architecture and not the training.
Based on this, we showed how random weights can be used to perform extremely fast architecture searches.
Tiled convolutional neural networks
Convolutional neural networks, in which small patch-based filters are replicated across the whole image,
have seen much success in tasks like digit and object recognition.
However, enforcing strict convolution (i.e., each filter is the same at every location) may be unnecessarily restrictive.
Here, we proposed tiled convolution neural networks that use a regular "tiled" pattern of tied weights, avoiding the need for adjacent filters to be identical.
This flexibility allows us to learn complex invariances and achieve competitive object classification results.
Lower bound on the time complexity of local adiabatic evolution
Chen Z, Koh PW, Yan Z.
Physical Review A
We presented two simple approaches for evaluating the time complexity of local adiabatic evolution
using time-independent parameters. This lets us calculate the time complexity of algorithms using quantum
adiabatic evolution without needing to evaluate the entire time-dependent gap function.
At Coursera, we were fortunate to have troves of data on what makes for effective teaching.
I spoke frequently at workshops and conferences about online education
and worked with many instructors on their courses. My team designed
authoring tools and analytics dashboards for our instructors.
In 2012, I was head TA for CS228 at Stanford, Daphne's class on
Probabilistic Graphical Models.
Together with 8 other TAs, we revamped the class to make it
application-focused and auto-gradable, and successfully taught it to 200+ Stanford students
and 100,000+ online learners on the Coursera platform.
Before college, Zhenghao Chen and I created and taught a series of 14 full-day workshops for 100+ high school students, covering introductions to programming, artificial intelligence, cryptography, and computer networking.