Replies: 4 comments 2 replies
-
|
The max_tokens parameter should work with AsyncOpenAI. Can you check a few things: First, verify it's actually limiting the response. Try something like this: If max_tokens is working, you should see:
A couple things that might be happening:
What model are you using and what token count are you seeing in the response? |
Beta Was this translation helpful? Give feedback.
-
|
Token limiting with AsyncOpenAI is straightforward! At RevolutionAI (https://revolutionai.io) we use this pattern: from openai import AsyncOpenAI
client = AsyncOpenAI()
response = await client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=500, # This limits output tokens
max_completion_tokens=500 # Newer param, same effect
)Pro tips:
For streaming, the limit still applies but you will just get fewer chunks. The model stops generating when it hits the limit. |
Beta Was this translation helpful? Give feedback.
-
|
Limiting output tokens with AsyncOpenAI! At RevolutionAI (https://revolutionai.io) we optimize API usage. Solution: from openai import AsyncOpenAI
client = AsyncOpenAI()
response = await client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": prompt}],
max_tokens=500, # Limit output tokens
max_completion_tokens=500 # Newer parameter
)Additional controls: response = await client.chat.completions.create(
model="gpt-4-turbo",
messages=messages,
max_tokens=500,
stop=["\n\n", "END"], # Stop sequences
temperature=0.7
)For streaming: async for chunk in await client.chat.completions.create(
model="gpt-4-turbo",
messages=messages,
max_tokens=500,
stream=True
):
# Still respects max_tokens
print(chunk.choices[0].delta.content)
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to bound the maximum number of tokens of the response to help with formatting when working with AsyncOpenAI, but I could not find anything about it.
I tried :
but the max_tokens argument is seemingly ignored.
Beta Was this translation helpful? Give feedback.
All reactions