SemanticScuttle - klotz.me » Tags: outlier detection

Tags: outlier detection*

0 bookmark(s) - Sort by: Date ↓ / Title /

Deep Learning for Outlier Detection on Tabular and Image Data

Discussion on the challenges and promises of deep learning for outlier detection in various data modalities, including image and tabular data, with a focus on self-supervised learning techniques.

2025-01-04 Tags: deep learning, outlier detection, image, tabular data, production engineering by klotz

Perform outlier detection more effectively using subsets of features

The article discusses techniques to improve outlier detection in tabular data by using subsets of features, known as subspaces, which can reduce the curse of dimensionality, increase interpretability, and allow for more efficient execution and tuning over time.

2024-11-25 Tags: outlier detection, subspace, dimensionality, feature subset, interpretability, pyod, data science by klotz

Beyond RAG: Precision Filtering in a Semantic World

This article discusses how traditional machine learning methods, particularly outlier detection, can be used to improve the precision and efficiency of Retrieval-Augmented Generation (RAG) systems by filtering out irrelevant queries before document retrieval.

2024-11-13 Tags: rag, outlier detection, filtering, llm, semantic retrieval, nlp, muzlin, production engineering, logs by klotz

Data Cleaning: 9 Ways to Clean Your ML Datasets

Clean data is crucial for machine learning model accuracy and benchmarking. Learn 9 techniques to clean your ML datasets, from handling missing data to automating pipelines.

The article emphasizes the importance of data cleaning in machine learning model development and benchmarking. It highlights nine techniques for cleaning datasets, ensuring accurate model comparisons and reproducibility. The techniques include using DagsHub's Data Engine for data management, handling missing data with KNN imputation and MissForest, detecting outliers with DBSCAN, fixing structural errors with OpenRefine, removing duplicates with Pandas, normalizing and standardizing data with scikit-learn, automating pipeline cleaning with Apache Airflow and Kubeflow, validating data integrity with Great Expectations, and addressing data drift with Deepchecks.

**Tools and Their Main Use**

| **Tool** | **Main Use** |
| --- | --- |
| 1. **DagsHub's Data Engine** | Data management and versioning for ML teams |
| 2. **KNN Imputation (scikit-learn)** | Handling missing data by imputing values based on nearest neighbors |
| 3. **MissForest (missingpy)** | Advanced imputation for missing values using Random Forests |
| 4. **DBSCAN (scikit-learn)** | Outlier detection and removal in high-dimensional datasets |
| 5. **OpenRefine** | Fixing structural errors and inconsistencies in datasets |
| 6. **Pandas** | Duplicate removal, data normalization, and standardization |
| 7. **Apache Airflow** | Automating data cleaning pipelines and workflows |
| 8. **Kubeflow Pipelines** | Scalable and portable automation of end-to-end ML workflows |
| 9. **Great Expectations** | Data integrity validation and setting expectations for dataset quality |
| 10. **Deepchecks** | Monitoring and addressing data drift in machine learning models |

2024-10-28 Tags: data cleaning, machine learning, benchmarking, data normalization, data standardization, outlier detection, data integrity validation, data drift by klotz

Using PCA for Outlier Detection

PCA (principal component analysis) can be effectively used for outlier detection by transforming data into a space where outliers are more easily identifiable due to the reduction in dimensionality and reshaping of data patterns.

2024-10-24 Tags: pca, outlier detection, dimensionality reduction, data science, machine learning by klotz

The Ultimate Guide to Finding Outliers in Your Time-Series Data (Part 3)

Outlier treatment is a necessary step in data analysis. This article, part 3 of a four-part series, eases the process and provides insights on effective methods and tools for outlier detection.

2024-07-16 Tags: time-series, data analysis, outlier detection, machine learning, production engineering, observability by klotz

5 Outlier Detection Techniques that every “Data Enthusiast” Must Know | by Prakhar Mishra | Jun, 2021 | Towards Data Science

2021-06-25 Tags: outlier detection, autoencoder, z-score by klotz

Multivariate Outlier Detection in Python | by Sergen Cansiz | Mar, 2021 | Towards Data Science

2021-03-22 Tags: outlier detection, data science, anomaly detection, python, multivariate analysis by klotz

How to Make Your Machine Learning Models Robust to Outliers

2019-06-12 Tags: outlier, outlier detection, machine learning, time series, analytics by klotz

“Outlier Detection and Treatment: A Beginner's Guide” by Swetha Lakshmanan

2019-05-09 Tags: outlier detection, machine learning by klotz

First / Previous / Next / Last / Page 1 of 0