klotz: data science* + machine learning*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. AI Nexus is a platform for collaboration, knowledge exchange, and groundbreaking discourse in AI. It features upcoming AI events, speaker series, and faculty contributions to the global AI community. The site also provides information on MBZUAI programs and opportunities for collaboration.
  2. This article details how to accelerate deep learning and LLM inference using Apache Spark, focusing on distributed inference strategies. It covers basic deployment with `predict_batch_udf`, advanced deployment with inference servers like NVIDIA Triton and vLLM, and deployment on cloud platforms like Databricks and Dataproc. It also provides guidance on resource management and configuration for optimal performance.
  3. The article showcases concise Python code snippets (one-liners) for common machine learning tasks like data splitting, standardization, model training (linear regression, logistic regression, decision tree, random forest), and prediction, leveraging libraries such as scikit-learn.

    | **#** | **One-Liner** | **Description** | **Library** | **Use Case** |
    |-----|-----------------------------------------------------|-------------------------------------------------------------------------------------|-------------------|-------------------------------------------------|
    | 1 | `from sklearn.datasets import load_iris; X, y = load_iris(return_X_y=True)` | Loads the Iris dataset, a classic for classification. | scikit-learn | Loading a standard dataset. |
    | 2 | `from sklearn.model_selection import train_test_split; X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)` | Splits the dataset into training and testing sets. | scikit-learn | Preparing data for model training & evaluation.|
    | 3 | `from sklearn.linear_model import LogisticRegression; model = LogisticRegression(random_state=1)` | Creates a Logistic Regression model. | scikit-learn | Binary Classification. |
    | 4 | `model.fit(X_train, y_train)` | Trains the Logistic Regression model. | scikit-learn | Model training. |
    | 5 | `y_pred = model.predict(X_test)` | Predicts labels for the test dataset. | scikit-learn | Making predictions. |
    | 6 | `from sklearn.metrics import accuracy_score; accuracy = accuracy_score(y_test, y_pred)` | Calculates the accuracy of the model. | scikit-learn | Evaluating model performance. |
    | 7 | `import pandas as pd; df = pd.DataFrame(X, columns=iris.feature_names)` | Creates a Pandas DataFrame from the Iris dataset features. | Pandas | Data manipulation and analysis. |
    | 8 | `df 'target' » = y` | Adds the target variable to the DataFrame. | Pandas | Combining features and labels. |
    | 9 | `df.head()` | Displays the first few rows of the DataFrame. | Pandas | Inspecting the data. |
    | 10 | `df.describe()` | Generates descriptive statistics of the DataFrame. | Pandas | Understanding data distribution. |
  4. NVIDIA DGX Spark is a desktop-friendly AI supercomputer powered by the NVIDIA GB10 Grace Blackwell Superchip, delivering 1000 AI TOPS of performance with 128GB of memory. It is designed for prototyping, fine-tuning, and inference of large AI models.
  5. The article explores the concept of Retrieval-Augmented Generation (RAG) using SQLite, specifically with the sqlite-vec extension and the OpenAI API. It outlines a simplified approach to RAG, moving away from complex frameworks and cloud vector databases, using SQLite's virtual tables for vector search and semantic understanding.
  6. A comprehensive guide to Large Language Models by Damien Benveniste, covering various aspects from transformer architectures to deploying LLMs.

    - Language Models Before Transformers
    - Attention Is All You Need: The Original Transformer Architecture
    - A More Modern Approach To The Transformer Architecture
    - Multi-modal Large Language Models
    - Transformers Beyond Language Models
    - Non-Transformer Language Models
    - How LLMs Generate Text
    - From Words To Tokens
    - Training LLMs to Follow Instructions
    - Scaling Model Training
    - Fine-Tuning LLMs
    - Deploying LLMs
  7. The article discusses methods for data scientists to answer 'what if' questions regarding the impact of actions or events without having conducted prior experiments. It focuses on creating counterfactual predictions using machine learning techniques and compares a proposed method with Google's Causal Impact. The approach involves using historical data and control groups to estimate the effect of modifications, addressing challenges such as seasonality, confounders, and temporal drift.
  8. This article provides an overview of feature selection in machine learning, detailing methods to maximize model accuracy, minimize computational costs, and introduce a novel method called History-based Feature Selection (HBFS).
  9. This article provides a non-technical guide to interpreting SHAP analyses, useful for explaining machine learning models to non-technical stakeholders, with a focus on both local and global interpretability using various visualization methods.
  10. A guide on how to use OpenAI embeddings and clustering techniques to analyze survey data and extract meaningful topics and actionable insights from the responses.

    The process involves transforming textual survey responses into embeddings, grouping similar responses through clustering, and then identifying key themes or topics to aid in business improvement.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: data science + machine learning

About - Propulsed by SemanticScuttle