Fixing ChatGPT API Timeout Issues

Overview

When calling the ChatGPT API with streaming output, network problems often cause timeouts. Interestingly, the author observed that timeouts encountered during local debugging automatically recover after 10 minutes, while waiting on a server often fails with a timeout error (HTTP 502).

The likely reason local requests recover is an automatic retry mechanism in the project (the ChatGPT API itself does not retry). The server returns 502 because the response needs to pass through a gateway layer, and the gateway's timeout threshold is shorter than the automatic retry interval (10 minutes), so the gateway times out before the retry completes.

This article addresses approaches to mitigate ChatGPT API call timeouts.

Goals

Avoid showing timeout error messages to users.
Shorten the retry time interval after a timeout.

Approach

Two approaches were considered.

1. Fix the underlying network issues

This is difficult. Timeouts are often caused by the OpenAI server side, and they can occur even from servers deployed outside of China.

2. Use automatic retries

Adjusting timeout parameters to enable faster retries proved feasible.

Implementation

The author adjusted timeout settings in two steps, from simple to more advanced. If you want the final solution directly, skip to "Solution 2".

Environment

Python: 3.10.7
openai: 0.27.6

Call method

openai.api_resources.chat_completion.ChatCompletion.acreate (this is the asynchronous method to call ChatGPT).

Call chain and timeout mapping

# Method -> timeout-related parameters openai.api_resources.chat_completion.ChatCompletion.acreate -> kwargs openai.api_resources.abstract.engine_api_resource.EngineAPIResource.acreate -> params openai.api_requestor.APIRequestor.arequest -> request_timeout # request_timeout becomes timeout at this step, so passing request_timeout is sufficient openai.api_requestor.APIRequestor.arequest_raw -> request_timeout aiohttp.client.ClientSession.request -> kwargs aiohttp.client.ClientSession._request -> timeout tm = TimeoutHandle(self._loop, real_timeout.total) -> ClientTimeout.total async with ceil_timeout(real_timeout.connect): -> ClientTimeout.connect # Sub-branch 1 aiohttp.connector.BaseConnector.connect -> timeout aiohttp.connector.TCPConnector._create_connection -> timeout aiohttp.connector.TCPConnector._create_direct_connection -> timeout aiohttp.connector.TCPConnector._wrap_create_connection -> timeout async with ceil_timeout(timeout.sock_connect): -> ClientTimeout.sock_connect # Sub-branch 2 aiohttp.client_reqrep.ClientRequest.send -> timeout aiohttp.client_proto.ResponseHandler.set_response_params -> read_timeout aiohttp.client_proto.ResponseHandler._reschedule_timeout -> self._read_timeout if timeout: self._read_timeout_handle = self._loop.call_later( timeout, self._on_read_timeout ) -> ClientTimeout.sock_read

Solution 1

The APIRequestor.arequest_raw method accepts a request_timeout parameter that can be given as a tuple for connect and total values. Therefore, calling ChatCompletion.acreate with request_timeout set to (10, 300) is possible.

# async def arequest_raw( # self, # method, # url, # session, # *, # params=None, # supplied_headers: Optional[Dict[str, str]] = None, # files=None, # request_id: Optional[str] = None, # request_timeout: Optional[Union[float, Tuple[float, float]]] = None, # ) -> aiohttp.ClientResponse: abs_url, headers, data = self._prepare_request_raw( url, supplied_headers, method, params, files, request_id ) if isinstance(request_timeout, tuple): timeout = aiohttp.ClientTimeout( connect=request_timeout[0], total=request_timeout[1], ) else: timeout = aiohttp.ClientTimeout( total=request_timeout if request_timeout else TIMEOUT_SECS ) ...

This approach works to some extent: it controls connection time and the total request time, but it does not fully eliminate timeout errors. Connection time and the time until the first character is read are different. The request connection can retry based on the total time (300s), but the gateway timeout is not set that long.

This led to Solution 2.

Solution 2

Monkey patch the openai.api_requestor.APIRequestor.arequest_raw method, rewriting the request_timeout parameter to support native aiohttp.ClientTimeout attributes.

1. Create api_requestor_mp.py and add the following code

# Note: the request_timeout parameter type has changed # Optional[Union[float, Tuple[float, float]]] -> Optional[Union[float, tuple]] async def arequest_raw( self, method, url, session, *, params=None, supplied_headers: Optional[Dict[str, str]] = None, files=None, request_id: Optional[str] = None, request_timeout: Optional[Union[float, tuple]] = None, ) -> aiohttp.ClientResponse: abs_url, headers, data = self._prepare_request_raw( url, supplied_headers, method, params, files, request_id ) # Determine the type of request_timeout and set sock_read and sock_connect as needed if isinstance(request_timeout, tuple): timeout = aiohttp.ClientTimeout( connect=request_timeout[0], total=request_timeout[1], sock_read=None if len(request_timeout) < 3 else request_timeout[2], sock_connect=None if len(request_timeout) < 4 else request_timeout[3], ) else: timeout = aiohttp.ClientTimeout( total=request_timeout if request_timeout else TIMEOUT_SECS ) if files: # TODO: Use aiohttp.MultipartWriter to create the multipart form data here. # For now we use the private requests method that is known to have worked so far. data, content_type = requests.models.RequestEncodingMixin._encode_files( # type: ignore files, data ) headers["Content-Type"] = content_type request_kwargs = { "method": method, "url": abs_url, "headers": headers, "data": data, "proxy": _aiohttp_proxies_arg(openai.proxy), "timeout": timeout, } try: result = await session.request(**request_kwargs) util.log_info( "OpenAI API response", path=abs_url, response_code=result.status, processing_ms=result.headers.get("OpenAI-Processing-Ms"), request_id=result.headers.get("X-Request-Id"), ) # Don't read the whole stream for debug logging unless necessary. if openai.log == "debug": util.log_debug( "API response body", body=result.content, headers=result.headers ) return result except (aiohttp.ServerTimeoutError, asyncio.TimeoutError) as e: raise error.Timeout("Request timed out") from e except aiohttp.ClientError as e: raise error.APIConnectionError("Error communicating with OpenAI") from e def monkey_patch(): APIRequestor.arequest_raw = arequest_raw

2. Apply the monkey patch during initialization

from your.package.path.api_requestor_mp import monkey_patch do_api_requestor = monkey_patch

After setting request_timeout=(10, 300, 15, 10), subsequent debugging showed no further issues.

Acceptance testing passed.

Lessons learned

Tracing the call chain directly from the code can be difficult; using exception stacks to find the call chain is often more convenient.
The request_timeout parameter exposed by the ChatGPT API client is insufficient for some scenarios and may need to be rewritten. Searching for monkey patch solutions can be practical.
Changing the code is not the hardest part; understanding what to change, how, and why is the real challenge.