This article explores the critical architectural decision of where to store conversation history when building AI agents. It examines how different storage strategies impact user experience, privacy, cost, and portability. The author compares service-managed versus client-managed storage models and details how modern APIs support both linear threads and forking/branching capabilities.
Key topics include:
* Service-Managed vs. Client-Managed storage tradeoffs
* Linear (single-threaded) vs. Forking-capable conversation models
* Strategies for context window management and compaction such as truncation, summarization, and sliding windows
* How Microsoft Agent Framework abstracts these patterns using AgentSession and ChatHistoryProvider to ensure provider-agnostic code
* Practical implementation examples for the Responses API in different modes
This tutorial demonstrates how to perform document clustering using LLM embeddings with scikit-learn. It covers generating embeddings with Sentence Transformers, reducing dimensionality with PCA, and applying KMeans clustering to group similar documents.
Unlock advanced customer segmentation techniques using LLMs, and improve your clustering models with advanced techniques
from scipy.spatial import Voronoi, voronoi_plot_2d
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
location1="XXX"
df = pd.read_csv(location1, encoding = "ISO-8859-1")
#Run kmeans clustering
X = df ['long','lat' » ].values #~2k locations in the UK
y=df 'label' » .values #Label is a 0 or 1
kmeans = KMeans(n_clusters=30, random_state=0).fit(X, y)
centers=kmeans.cluster_centers_
plt.scatter(centers :,0 » ,centers :,1 » , marker='s', s=100)
vor = Voronoi(centers)
fig = voronoi_plot_2d(vor,plt.gca())
plt.show()