Overview
When calling the ChatGPT API with streaming output, network problems often cause timeouts. Interestingly, the author observed that timeouts encountered during local debugging automatically recover after 10 minutes, while waiting on a server often fails with a timeout error (HTTP 502).
The likely reason local requests recover is an automatic retry mechanism in the project (the ChatGPT API itself does not retry). The server returns 502 because the response needs to pass through a gateway layer, and the gateway's timeout threshold is shorter than the automatic retry interval (10 minutes), so the gateway times out before the retry completes.
This article addresses approaches to mitigate ChatGPT API call timeouts.
Goals
- Avoid showing timeout error messages to users.
- Shorten the retry time interval after a timeout.
Approach
Two approaches were considered.
1. Fix the underlying network issues
This is difficult. Timeouts are often caused by the OpenAI server side, and they can occur even from servers deployed outside of China.
2. Use automatic retries
Adjusting timeout parameters to enable faster retries proved feasible.
Implementation
The author adjusted timeout settings in two steps, from simple to more advanced. If you want the final solution directly, skip to "Solution 2".
Environment
- Python: 3.10.7
- openai: 0.27.6
Call method
openai.api_resources.chat_completion.ChatCompletion.acreate (this is the asynchronous method to call ChatGPT).
Call chain and timeout mapping
# Method -> timeout-related parameters
openai.api_resources.chat_completion.ChatCompletion.acreate -> kwargs
openai.api_resources.abstract.engine_api_resource.EngineAPIResource.acreate -> params
openai.api_requestor.APIRequestor.arequest -> request_timeout
# request_timeout becomes timeout at this step, so passing request_timeout is sufficient
openai.api_requestor.APIRequestor.arequest_raw -> request_timeout
aiohttp.client.ClientSession.request -> kwargs
aiohttp.client.ClientSession._request -> timeout
tm = TimeoutHandle(self._loop, real_timeout.total) -> ClientTimeout.total
async with ceil_timeout(real_timeout.connect): -> ClientTimeout.connect
# Sub-branch 1
aiohttp.connector.BaseConnector.connect -> timeout
aiohttp.connector.TCPConnector._create_connection -> timeout
aiohttp.connector.TCPConnector._create_direct_connection -> timeout
aiohttp.connector.TCPConnector._wrap_create_connection -> timeout
async with ceil_timeout(timeout.sock_connect): -> ClientTimeout.sock_connect
# Sub-branch 2
aiohttp.client_reqrep.ClientRequest.send -> timeout
aiohttp.client_proto.ResponseHandler.set_response_params -> read_timeout
aiohttp.client_proto.ResponseHandler._reschedule_timeout -> self._read_timeout
if timeout:
self._read_timeout_handle = self._loop.call_later(
timeout, self._on_read_timeout
) -> ClientTimeout.sock_read
Solution 1
The APIRequestor.arequest_raw method accepts a request_timeout parameter that can be given as a tuple for connect and total values. Therefore, calling ChatCompletion.acreate with request_timeout set to (10, 300) is possible.
# async def arequest_raw(
# self,
# method,
# url,
# session,
# *,
# params=None,
# supplied_headers: Optional[Dict[str, str]] = None,
# files=None,
# request_id: Optional[str] = None,
# request_timeout: Optional[Union[float, Tuple[float, float]]] = None,
# ) -> aiohttp.ClientResponse:
abs_url, headers, data = self._prepare_request_raw(
url, supplied_headers, method, params, files, request_id
)
if isinstance(request_timeout, tuple):
timeout = aiohttp.ClientTimeout(
connect=request_timeout[0],
total=request_timeout[1],
)
else:
timeout = aiohttp.ClientTimeout(
total=request_timeout if request_timeout else TIMEOUT_SECS
)
...
This approach works to some extent: it controls connection time and the total request time, but it does not fully eliminate timeout errors. Connection time and the time until the first character is read are different. The request connection can retry based on the total time (300s), but the gateway timeout is not set that long.
This led to Solution 2.
Solution 2
Monkey patch the openai.api_requestor.APIRequestor.arequest_raw method, rewriting the request_timeout parameter to support native aiohttp.ClientTimeout attributes.
1. Create api_requestor_mp.py and add the following code
# Note: the request_timeout parameter type has changed
# Optional[Union[float, Tuple[float, float]]] -> Optional[Union[float, tuple]]
async def arequest_raw(
self,
method,
url,
session,
*,
params=None,
supplied_headers: Optional[Dict[str, str]] = None,
files=None,
request_id: Optional[str] = None,
request_timeout: Optional[Union[float, tuple]] = None,
) -> aiohttp.ClientResponse:
abs_url, headers, data = self._prepare_request_raw(
url, supplied_headers, method, params, files, request_id
)
# Determine the type of request_timeout and set sock_read and sock_connect as needed
if isinstance(request_timeout, tuple):
timeout = aiohttp.ClientTimeout(
connect=request_timeout[0],
total=request_timeout[1],
sock_read=None if len(request_timeout) < 3 else request_timeout[2],
sock_connect=None if len(request_timeout) < 4 else request_timeout[3],
)
else:
timeout = aiohttp.ClientTimeout(
total=request_timeout if request_timeout else TIMEOUT_SECS
)
if files:
# TODO: Use aiohttp.MultipartWriter to create the multipart form data here.
# For now we use the private requests method that is known to have worked so far.
data, content_type = requests.models.RequestEncodingMixin._encode_files( # type: ignore
files, data
)
headers["Content-Type"] = content_type
request_kwargs = {
"method": method,
"url": abs_url,
"headers": headers,
"data": data,
"proxy": _aiohttp_proxies_arg(openai.proxy),
"timeout": timeout,
}
try:
result = await session.request(**request_kwargs)
util.log_info(
"OpenAI API response",
path=abs_url,
response_code=result.status,
processing_ms=result.headers.get("OpenAI-Processing-Ms"),
request_id=result.headers.get("X-Request-Id"),
)
# Don't read the whole stream for debug logging unless necessary.
if openai.log == "debug":
util.log_debug(
"API response body", body=result.content, headers=result.headers
)
return result
except (aiohttp.ServerTimeoutError, asyncio.TimeoutError) as e:
raise error.Timeout("Request timed out") from e
except aiohttp.ClientError as e:
raise error.APIConnectionError("Error communicating with OpenAI") from e
def monkey_patch():
APIRequestor.arequest_raw = arequest_raw
2. Apply the monkey patch during initialization
from your.package.path.api_requestor_mp import monkey_patch
do_api_requestor = monkey_patch
After setting request_timeout=(10, 300, 15, 10), subsequent debugging showed no further issues.
Acceptance testing passed.
Lessons learned
- Tracing the call chain directly from the code can be difficult; using exception stacks to find the call chain is often more convenient.
- The request_timeout parameter exposed by the ChatGPT API client is insufficient for some scenarios and may need to be rewritten. Searching for monkey patch solutions can be practical.
- Changing the code is not the hardest part; understanding what to change, how, and why is the real challenge.