# Tiktoken Estimator Integration
Estimate token count for text using the tiktoken library, enabling accurate token counting for OpenAI models.
Features
- Accurate Token Counting: Uses the official tiktoken library to provide precise token estimates
- Multi-Model Support: Supports various OpenAI models (gpt-3.5-turbo, gpt-4, etc.)
- Safety Limits: Optional safety limit checking to prevent token overages
- Zero Configuration: No setup required - works out of the box
- Error Handling: Graceful error handling with descriptive messages
Usage
Estimate Tokens Action
The integration provides a single action: estimateTokens
Input Parameters:
text
(required): The text to estimate tokens for
model
(optional): The OpenAI model to use for tokenization (defaults to "gpt-3.5-turbo")
safetyLimit
(optional): Safety limit for token count estimation. If left empty, no limit will be applied
Output:
tokenCount
: The estimated number of tokens in the text
tokenizerName
: The name of the tokenizer used
model
: The model the tokenization was based on
limitExceeded
: Indicates if the estimated token count exceeded the safety limit (only present when safetyLimit is provided)
Example Usage
Basic Usage:
Text: "Hello, world!"
Model: "gpt-3.5-turbo"
Result:
- tokenCount: 4
- tokenizerName: "tiktoken"
- model: "gpt-3.5-turbo"
With Safety Limit:
Text: "This is a longer text that might exceed our safety limit..."
Model: "gpt-3.5-turbo"
SafetyLimit: 10
Result:
- tokenCount: 15
- tokenizerName: "tiktoken"
- model: "gpt-3.5-turbo"
- limitExceeded: true
Supported Models
gpt-3.5-turbo
gpt-4
gpt-4-turbo
text-davinci-003
text-davinci-002
code-davinci-002
- And other OpenAI models supported by tiktoken
Recommended Safety Limits
When setting safety limits, consider that your actual API calls will include additional tokens for system prompts, conversation history, and response generation. Here are conservative recommendations:
GPT-3.5-Turbo (4,096 token limit)
- Conservative: 2,500 tokens (leaves ~1,600 for system prompts + response)
- Moderate: 3,000 tokens (leaves ~1,100 for system prompts + response)
- Aggressive: 3,500 tokens (leaves ~600 for system prompts + response)
GPT-4 (8,192 token limit)
- Conservative: 5,000 tokens (leaves ~3,200 for system prompts + response)
- Moderate: 6,000 tokens (leaves ~2,200 for system prompts + response)
- Aggressive: 7,000 tokens (leaves ~1,200 for system prompts + response)
GPT-4 Turbo (128,000 token limit)
- Conservative: 100,000 tokens (leaves ~28,000 for system prompts + response)
- Moderate: 110,000 tokens (leaves ~18,000 for system prompts + response)
- Aggressive: 120,000 tokens (leaves ~8,000 for system prompts + response)
Note: These recommendations assume typical system prompt sizes (200-800 tokens) and desired response lengths (500-2,000 tokens). Adjust based on your specific use case.
Error Handling
The integration handles various error scenarios:
- Invalid Input: Returns clear error messages for missing or invalid text
- Empty Text: Returns 0 tokens for empty strings
- Unsupported Model: Returns error for models not supported by tiktoken
- Tokenization Errors: Handles tiktoken library errors gracefully
- Safety Limit Warnings: Logs warnings when token counts exceed safety limits
Benefits
- Cost Optimization: Estimate token costs before making API calls
- Rate Limiting: Manage token budgets and prevent overages with safety limits
- Workflow Logic: Enable conditional logic based on token counts and safety thresholds
- Transparency: Provide visibility into token usage patterns
- Proactive Monitoring: Set safety limits to catch potential token overages early
# Tiktoken Estimator Integration
Estimate token count for text using the tiktoken library, enabling accurate token counting for OpenAI models.
## Features
- **Accurate Token Counting**: Uses the official tiktoken library to provide precise token estimates
- **Multi-Model Support**: Supports various OpenAI models (gpt-3.5-turbo, gpt-4, etc.)
- **Safety Limits**: Optional safety limit checking to prevent token overages
- **Zero Configuration**: No setup required - works out of the box
- **Error Handling**: Graceful error handling with descriptive messages
## Usage
### Estimate Tokens Action
The integration provides a single action: `estimateTokens`
**Input Parameters:**
- `text` (required): The text to estimate tokens for
- `model` (optional): The OpenAI model to use for tokenization (defaults to "gpt-3.5-turbo")
- `safetyLimit` (optional): Safety limit for token count estimation. If left empty, no limit will be applied
**Output:**
- `tokenCount`: The estimated number of tokens in the text
- `tokenizerName`: The name of the tokenizer used
- `model`: The model the tokenization was based on
- `limitExceeded`: Indicates if the estimated token count exceeded the safety limit (only present when safetyLimit is provided)
### Example Usage
**Basic Usage:**
```
Text: "Hello, world!"
Model: "gpt-3.5-turbo"
Result:
- tokenCount: 4
- tokenizerName: "tiktoken"
- model: "gpt-3.5-turbo"
```
**With Safety Limit:**
```
Text: "This is a longer text that might exceed our safety limit..."
Model: "gpt-3.5-turbo"
SafetyLimit: 10
Result:
- tokenCount: 15
- tokenizerName: "tiktoken"
- model: "gpt-3.5-turbo"
- limitExceeded: true
```
## Supported Models
- `gpt-3.5-turbo`
- `gpt-4`
- `gpt-4-turbo`
- `text-davinci-003`
- `text-davinci-002`
- `code-davinci-002`
- And other OpenAI models supported by tiktoken
## Recommended Safety Limits
When setting safety limits, consider that your actual API calls will include additional tokens for system prompts, conversation history, and response generation. Here are conservative recommendations:
### GPT-3.5-Turbo (4,096 token limit)
- **Conservative**: 2,500 tokens (leaves ~1,600 for system prompts + response)
- **Moderate**: 3,000 tokens (leaves ~1,100 for system prompts + response)
- **Aggressive**: 3,500 tokens (leaves ~600 for system prompts + response)
### GPT-4 (8,192 token limit)
- **Conservative**: 5,000 tokens (leaves ~3,200 for system prompts + response)
- **Moderate**: 6,000 tokens (leaves ~2,200 for system prompts + response)
- **Aggressive**: 7,000 tokens (leaves ~1,200 for system prompts + response)
### GPT-4 Turbo (128,000 token limit)
- **Conservative**: 100,000 tokens (leaves ~28,000 for system prompts + response)
- **Moderate**: 110,000 tokens (leaves ~18,000 for system prompts + response)
- **Aggressive**: 120,000 tokens (leaves ~8,000 for system prompts + response)
**Note**: These recommendations assume typical system prompt sizes (200-800 tokens) and desired response lengths (500-2,000 tokens). Adjust based on your specific use case.
## Error Handling
The integration handles various error scenarios:
- **Invalid Input**: Returns clear error messages for missing or invalid text
- **Empty Text**: Returns 0 tokens for empty strings
- **Unsupported Model**: Returns error for models not supported by tiktoken
- **Tokenization Errors**: Handles tiktoken library errors gracefully
- **Safety Limit Warnings**: Logs warnings when token counts exceed safety limits
## Benefits
- **Cost Optimization**: Estimate token costs before making API calls
- **Rate Limiting**: Manage token budgets and prevent overages with safety limits
- **Workflow Logic**: Enable conditional logic based on token counts and safety thresholds
- **Transparency**: Provide visibility into token usage patterns
- **Proactive Monitoring**: Set safety limits to catch potential token overages early