Orchestration Framework: LangChain Deep Dive
Introduction
The orchestration layer plays a pivotal role in the emerging LLM tech stack by interacting with all the various components in the data, model, and operational layers. In this article, we’ll explore the LangChain framework more in-depth, covering major concepts (e.g. components, chains, and agents) and key characteristics (e.g. integrations, abstractions, and contexts). LangChain is part of a broader language model application ecosystem which includes other compatible frameworks, such as LangGraph for complex control flows and LangSmith for observability and evaluation. We’ll also do a quick comparison between programmatic and GUI orchestration frameworks. Lastly, the advantages and limitations of LangChain are reviewed.
A diagram of the layers and major components in the emerging LLM tech stack (accessed November 2024). Source: https://www.codesmith.io/blog/introducing-the-emerging-llm-tech-stack
LangChain
As a framework for developing applications powered by LLMs, LangChain is made up of several open-source libraries (available in JavaScript and Python):
- langchain/core: Contains the base abstractions and LangChain Expression Language.
- langchain/community: Houses third-party integrations, such as for OpenAI and Anthropic.
- langchain: Contains the blocks, chains, agents, and retrievals that comprise an application's logic architecture.
The framework displays these key characteristics:
- Integrations: Extensive integrations with LLMs and cloud platforms.
- Abstractions: Modularity and abstractions for flexible LLM application building.
- Contexts: Context-awareness through data source connections and retrieval strategies, such as the Retrieval Augmented Generation (RAG) pattern.
In particular, the framework defines and exposes common interfaces for components critical to LLM applications. For instance, the retriever interface defines a standard way to connect to different data stores. Hence, all retrievers can be invoked in a common way, even though the underlying implementations are specific to the actual databases or data services.
```JavaScript
const docs = await retriever.invoke(query);
```
Retriever code example (accessed November 2024). Source: https://js.langchain.com/docs/concepts/retrievers/
Components
Components are the core building blocks used in LLM applications. LangChain defines many helpful components.
For the data layer (non-exhaustive list):
- Document Loaders: Load different formatted files from a variety of sources, including CSV, PDF, HTML, and Markdown.
- Text Splitters: Take a document as input and split it into chunks.
- Embedding Models: Take a piece of data as input and create a numerical representation of it.
- Vector Stores: Connect with vector databases to store and retrieve embeddings.
- Retrievers: Take a query as input and return relevant documents.
For the model layer (non-exhaustive list):
- Chat Models: Language models that take messages as input and output a message.
- Messages: The input and output of chat models. Some messages have the `role` parameter defined, which specifies the source of the messages.
- Prompt Templates: Format user input and other instructions into a formal structure that can be passed to a language model.
- Output Parsers: Take the output of a language model and parse it into a structured format.
- Memory: Provide simple utilities for adding memory to a system, such as retaining conversation history.
For the operational layer (non-exhaustive list):
- Callbacks: Hook into the various stages of an LLM application's execution. Enable the running of custom auxiliary code in built-in components, such as streaming output or tracing intermediate steps.
- Tools: Encapsulate a function and associated schema to be passed to chat models for calling. Tools expand an LLM's access to information and functionality. Some examples include Google Search, Wikipedia, and Wolfram Alpha.
Chains
Chains are the core of LangChain workflows, allowing for the combination of multiple components in a sequence.
The simplest chain is LLMChain, which calls a language model with a prompt. The specific LLM is set, along with a prompt template and an optional output parser.
```JavaScript
import { OpenAI } from "langchain/llms/openai";
import { PromptTemplate } from "langchain/prompts";
import { LLMChain } from "langchain/chains";
// Initialize the OpenAI model
const model = new OpenAI({
modelName: "text-davinci-003"
openAIApiKey: "YOUR_OPENAI_API_KEY",
temperature: 0.7
});
// Create a prompt template
const template = "What is a good name for a company that makes {product}?";
const myPrompt = new PromptTemplate({
template: template,
inputVariables: ["product"],
});
// Create the LLMChain
const chain = new LLMChain({ llm: model, prompt: myPrompt });
// Run the chain
const result = await chain.invoke({ product: "comfortable shoes" });
console.log(result);
```
Note: This is a simplified code example based on LangChain v0.1 for illustrative purposes only. At the time of writing, the latest version available is v0.3 and the above code may be deprecated.
The LangChain Expression Language (LCEL) provides a syntax to create arbitrary custom chains. It is based on the abstraction of a `Runnable`, which is "a generic unit of work to be invoked, batched, streamed, and/or transformed."
Agents
LangChain legacy agents use a language model as a "reasoning engine" to decide a sequence of actions to take. They are provided with user input (e.g. queries) and relevant steps executed previously. They can also interact with external resources via available tools. At the time of writing, advanced agent capabilities are moving to LangGraph.
LangChain Ecosystem
From the original LangChain framework, a broader ecosystem of tools, services, and libraries has since been developed to address the evolving needs of building production-grade LLM applications.
A diagram of the LangChain ecosystem (accessed November 2024). Source: https://js.langchain.com/docs/introduction/
LangServe
LangServe (not in the preceding diagram) is an open-source library (available in Python) for deploying LangChain runnables and chains as a REST API. It leverages FastAPI, a modern Python framework for building APIs, and Pydantic, a Python data validation library. A client is also provided, which can call runnables deployed on a server. LangServe is limited to deploying simple runnables.
LangGraph
LangGraph is an open-source library (available in JavaScript and Python) for "building stateful, multi-actor applications with LLMs" by modeling steps as edges and nodes in a graph. It supports single-agent and multi-agent workflows.
It offers the following main benefits:
- Cycles and Branching: Can define loops and conditionals in your flows.
- Persistence: Has built-in persistence and automatically saves state after each step in the graph. The graph execution can be paused and resumed at any point to support error recovery, human-in-the-loop intervention, and time travel.
- Controllability: Designed as a low-level framework, provides fine-grained control over the state and flow of your applications.
- Integration: Works harmoniously with LangChain and LangSmith (if needed, but not required).
Don't confuse it with the LangGraph Cloud, a commercial infrastructure platform for deploying and hosting LangGraph applications to production (built on top of the open-source LangGraph framework).
LangSmith
LangSmith is a commercial developer platform to monitor, debug, and evaluate LLM applications. It helps your LLM applications progress from prototype to production and offers easy integration with LangChain and LangGraph. With its tracing functionality, you can inspect and debug each step of your chains and agents. LangSmith provides evaluation features for your LLM applications, such as creating datasets, defining metrics, and running evaluators.
Programmatic vs. GUI Orchestration Frameworks
LangChain is a programmatic orchestration framework, as it requires installing, importing, and using libraries in your LLM application source code.
Aside from programming frameworks, there are graphical user interface (GUI) frameworks available for orchestration. For example, Flowise is an open-source tool built on top of LangChain components. Flowise can be run on your local machine, self-hosted in your cloud infrastructure, or accessed via the commercial web application Flowise Cloud. It has a simple user interface with drag-and-drop functionality for visually adding and chaining together major components.
A simple LLM chain in Flowise visual tool (accessed December 2023). Source: https://flowiseai.com/
Compared to the LLMChain code example in a prior section, the drag-and-drop UI is simpler to get started with, requiring no existing knowledge of the underlying libraries and programming syntax. If you are just getting started with LLM applications and orchestration frameworks, exploring a GUI orchestration tool is helpful to get familiar with the various components. Or if your LLM application only requires straightforward chains, then a GUI orchestration tool may suffice. However, a programmatic orchestration framework may offer more fine-grain control and customization of components and workflows.
ORCHESTRATION FRAMEWORKS |
Programming Frameworks: LangChain (open-source) Haystack (open-source) |
GUI Frameworks: Flowise (open-source) Stack AI (commercial) |
A table of available orchestration frameworks (as of November 2024, non-exhaustive). Source: Author
LangChain Advantages & Limitations
To LangChain or not to LangChain? That's the question. As with any application development framework, there are benefits and drawbacks to using an orchestration framework like LangChain.
Advantages
- Accelerated Development: As the langchain/community library contains many integrations with popular providers (e.g. OpenAI), this can reduce development time and effort.
- Streamlined Development: The langchain/core and langchain libraries define a common interface for components and workflows. This provides a standardized approach to composing and chaining components.
- Support: Langchain is open-source with extensive documentation and community engagement.
Limitations
- Complexity: Requires learning about the LangChain framework's concepts and components. It's not as simple as directly using the LLM APIs.
- Overhead: Involves installation and updating of the LangChain libraries. Dependency on a framework can be problematic if it becomes outdated or unsupported.
- Potential Inefficiencies: Identifying performance inefficiencies can be difficult due to the "multiple layers of abstraction and numerous components." Also, the framework's default configuration, such as the settings for token usage and API calls, may not be optimized for cost or latency.
Summary
After a more comprehensive look at the orchestration framework LangChain (components, chains, and agents) and its broader ecosystem of tools and services (LangServe, LangGraph, and LangSmith), you now have a better idea of how it may assist you in your LLM application development. For your LLM application needs, would a programmatic or GUI orchestration framework be more suitable? Go ahead and experiment with LangChain to see if it fits your development requirements.
Diana Cheung (ex-LinkedIn software engineer, USC MBA, and Codesmith alum) is a technical writer on technology and business. She is an avid learner and has a soft spot for tea and meows.