BrisquelyBrusque writes "I think what he's getting at is, we'll never have an algorithm that is
1. fast, distributed, easily deployed
2. interpretable
3. able to converge quickly for most problems
4. robust to noise, outliers, multicollinearity, class imbalance, and the curse of dimensionality
5. optimized for any combination of numeric variables and factors
6. self-supervised (no need for extensive parameter tuning)
7. capable of probability estimates as well as predictions
8. able to issue predictions for multiple targets
9. comfortable with structured, unstructured data (text, 2D, 3D, audio, tabular)
10. open-source
Besides, a recent analysis by Amazon Web Services found that 50 to 95% of all ML applications in an organization are based on traditional ML (random forests, regression models). That's why these application papers matter -- we're learning to make progress in certain areas where traditional ML fails."
Isabel Segura-Bedmar, V´ıctor Suarez-Paniagua, Paloma Mart ´ ´ınez
Computer Science Department
University Carlos III of Madrid, Spain
This paper describes a machine learningbased
approach that uses word embedding
features to recognize drug names from
biomedical texts. As a starting point,
we developed a baseline system based on
Conditional Random Field (CRF) trained
with standard features used in current
Named Entity Recognition (NER) systems.
Then, the system was extended to
incorporate new features, such as word
vectors and word clusters generated by
the Word2Vec tool and a lexicon feature
from the DINTO ontology. We trained the
Word2vec tool over two different corpus:
Wikipedia and MedLine. Our main goal
is to study the effectiveness of using word
embeddings as features to improve performance
on our baseline system, as well as
to analyze whether the DINTO ontology
could be a valuable complementary data
source integrated in a machine learning
NER system. To evaluate our approach
and compare it with previous work, we
conducted a series of experiments on the
dataset of SemEval-2013 Task 9.1 Drug
Name Recognition.