Lak Lakshmanan provides a framework for choosing the architecture of a GenAI (Generative AI) application, balancing creativity and risk. my The framework consists of eight patterns:
Generate Each Time: Invoke the LLM API for every request, suitable for high creativity and low-risk tasks like internal tools.
Response/Prompt Caching: Cache past prompts and responses to reduce cost and latency, ideal for medium creativity and low-risk tasks like internal customer support.
Pregenerated Templates: Use pre-vetted templates for repetitive tasks, reducing human review needs. Suitable for medium creativity and low-medium risk tasks.
Small Language Models (SLMs): Use smaller models for low creativity and low-risk tasks, reducing hallucinations and cost.
Assembled Reformat: Use LLMs for reformatting and summarization with pre-generated content, ensuring accuracy.
ML Selection of Template: Use machine learning to select appropriate pre-generated templates based on user context, balancing personalization with risk.
Fine-tune: Fine-tune LLMs to generate desired content while minimizing undesired outputs, addressing specific risks like brand voice or confidentiality.
Guardrails: Implement preprocessing, post-processing, and iterative prompting for high creativity and high-risk tasks, using off-the-shelf or custom-built guardrails.
This framework helps in balancing complexity, fit-for-purpose, risk, cost, and latency for each use case in GenAI applications.