RecursionError: Maximum recursion depth exceeded #4061

MustaphaU · 2024-03-22T07:04:01Z

Describe the bug

I need help with this recursion error maximum recursion depth exceeded from boto3. This occurs when I initialize an s3 client in my inference script to allow me read s3 objects. Your insights will be deeply appreciated! Similar issue was posted on stackoverflow 2 months ago here: https://stackoverflow.com/questions/77786275/aws-sagemaker-endpoint-maximum-recursion-depth-exceeded-error-when-calling-boto

Here is the relevant code block responsible for the error:

def get_video_bytes_from_s3(bucket_name, key):
    s3_client = boto3.client('s3')
    try:
        video_object = s3_client.get_object(Bucket= bucket_name, Key=key)
        video_bytes = video_object['Body'].read()
        return video_bytes
    except Exception as e:
        print(f"Failed to fetch video from S3: {e}")

Expected Behavior

The s3 client created to enable access to the s3 objects

Current Behavior

Here is the full error log:

Traceback (most recent call last):
  File "/sagemaker/python_service.py", line 423, in _handle_invocation_post
    res.body, res.content_type = handlers(data, context)
  File "/opt/ml/model/code/inference.py", line 156, in handler
    video_bytes = get_video_bytes_from_s3(key)
  File "/opt/ml/model/code/inference.py", line 16, in get_video_bytes_from_s3
    s3_client = boto3.client('s3')
  File "/usr/local/lib/python3.10/site-packages/boto3/__init__.py", line 92, in client
    return _get_default_session().client(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/boto3/session.py", line 299, in client
    return self._session.create_client(
  File "/usr/local/lib/python3.10/site-packages/botocore/session.py", line 997, in create_client
    client = client_creator.create_client(
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 159, in create_client
    client_args = self._get_client_args(
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 490, in _get_client_args
    return args_creator.get_client_args(
  File "/usr/local/lib/python3.10/site-packages/botocore/args.py", line 137, in get_client_args
    endpoint = endpoint_creator.create_endpoint(
  File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 409, in create_endpoint
    http_session = http_session_cls(
  File "/usr/local/lib/python3.10/site-packages/botocore/httpsession.py", line 323, in __init__
    self._manager = PoolManager(**self._get_pool_manager_kwargs())
  File "/usr/local/lib/python3.10/site-packages/botocore/httpsession.py", line 341, in _get_pool_manager_kwargs
    'ssl_context': self._get_ssl_context(),
  File "/usr/local/lib/python3.10/site-packages/botocore/httpsession.py", line 350, in _get_ssl_context
    return create_urllib3_context()
  File "/usr/local/lib/python3.10/site-packages/botocore/httpsession.py", line 139, in create_urllib3_context
    context.options |= options
  File "/usr/local/lib/python3.10/ssl.py", line 620, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  File "/usr/local/lib/python3.10/ssl.py", line 620, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  File "/usr/local/lib/python3.10/ssl.py", line 620, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  [Previous line repeated 479 more times]

Reproduction Steps

simply initializing an s3 client within an inference script like so:

s3_client = boto3.client('s3')

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.34.55

Environment details (OS name and version, etc.)

Sagemaker endpoint for Tensorflow serving

The text was updated successfully, but these errors were encountered:

avishwanathan88 · 2024-03-30T04:02:25Z

I am facing this maximum recursion depth issue suddenly as well when trying to check if the object exists in the s3 bucket using

s3_client.head_object(Bucket=bucket_name, Key=key)

It used to work before but not sure if something changed suddenly.
the s3 client is created using

boto3.client(service_name='s3',
                                 use_ssl=False,
                                 region_name=region,
                                 endpoint_url=endpoint_url,
                                 aws_access_key_id=key_id,
                                 aws_secret_access_key=access_key,
                                 config=Config(
                                    s3={'addressing_style': 'path'},
                                    signature_version='s3v4'))

RyanFitzSimmonsAK · 2024-05-14T21:25:11Z

Hi @MustaphaU, thanks for reaching out. If you limit the script to only be initializing a client (no actual operations), do you still have this behavior? In other words, what is the minimum reproducible code snippet that produces this recursion depth error? Thanks!

MustaphaU · 2024-05-14T23:49:52Z

Hi @MustaphaU, thanks for reaching out. If you limit the script to only be initializing a client (no actual operations), do you still have this behavior? In other words, what is the minimum reproducible code snippet that produces this recursion depth error? Thanks!

Hi @RyanFitzSimmonsAK
Just initializing the s3 client in my inference script like below is enough to reproduce the error

s3_client = boto3.client('s3')

Thank you.

Edit: The error persists. Apologies for the back and forth. Yes, s3_client=boto3.client('s3') should produce the error. I just tested now and got the error.

MustaphaU · 2024-05-15T00:49:22Z

@RyanFitzSimmonsAK

Please see the attached below from cloudwatch logs:

Also, see the relevant part of the inference script:

You would observe from the log that execution failed at the point of initializing the s3 client.

Thanks.

pmaoui · 2024-05-16T06:18:22Z

I also have this bug. One message I got while performing some tests to fix that:

Hope it could help.

As a workaround I used the awscli already present in the container:

import subprocess
subprocess.run(["/usr/local/bin/aws", "s3", "cp", "s3://bucket/file, "/local/file"], check=True)

shresthapradip · 2024-05-24T17:15:42Z

I am also getting the same error. It was working fine a few weeks ago.

shresthapradip · 2024-05-24T18:16:09Z

So,

s3.Bucket(settings.S3_BUCKET).put_object(Key=key, Body=file_data)

works, but the following code doesn't. This is a nightmare :)

res = self.s3.put_object(Bucket=settings.S3_BUCKET,
                      Key=key,
                      Body=file_data)

shresthapradip · 2024-05-24T18:16:49Z

Probably same thing goes to get_object

RyanFitzSimmonsAK · 2024-06-05T22:16:41Z

Given that you're only seeing this behavior in Sagemaker inference scripts, it's likely not purely a Boto3 problem. I've reached out to the Sagemaker team for more information about this issue, and will update this issue whenever I have more information.

Ticket # for internal use : P133939124

RyanFitzSimmonsAK · 2024-06-13T18:25:08Z

Neither I nor the service team were able to reproduce this issue. Could you provide the following information?

What Sagemaker image are you using?
Are you following an example notebook?
Are you deploying in a VPC?
Can you provide a minimal inference.py that produces this behavior?

MustaphaU · 2024-06-14T12:50:24Z

@RyanFitzSimmonsAK
Thanks. Not following an example notebook or deploying in a VPC. I have created a repo with instructions to reproduce the issue here:
https://github.com/MustaphaU/rerror

deepblue-phoenix · 2024-06-14T16:30:40Z

seeing this issue as well, except with creating clients for the boto3 secrets_manager

RyanFitzSimmonsAK · 2024-06-20T20:24:05Z

Hi, just an update. The service team was able to reproduce this behavior, and is working on determining the root cause.

deepblue-phoenix · 2024-06-25T13:34:12Z

Hi, just an update. The service team was able to reproduce this behavior, and is working on determining the root cause.

this is fantastic news! thank you team! :)

just for external planning and orientation, are there any ideas roughly if this is a high-priority issue or some other level?
the bug is exhibiting for us in one of our critical paths. we have a temporary bypass for it but would really like to get back to using boto3 fully.

appreciate the help, and very happy you can reproduce the issue :)

BadSergey87 · 2024-06-27T05:37:52Z

I was facing the same issue when trying to build a sagemaker serving tenserflow image.
adding monkey patch to python_service.py at the very top helped me.

import gevent.monkey
gevent.monkey.patch_all()

this was suggested in stackoverflow thread here: https://stackoverflow.com/questions/45425236/gunicorn-recursionerror-with-gevent-and-requests-in-python-3-6-2

MustaphaU · 2024-06-30T16:22:14Z

I was facing the same issue when trying to build a sagemaker serving tenserflow image. adding monkey patch to python_service.py at the very top helped me.
import gevent.monkey
gevent.monkey.patch_all()
this was suggested in stackoverflow thread here: https://stackoverflow.com/questions/45425236/gunicorn-recursionerror-with-gevent-and-requests-in-python-3-6-2

Thanks for the suggestion. I had tried this fix but it didn't resolve the issue. I mentioned it here on stackoverflow

BadSergey87 · 2024-06-30T21:47:51Z

I was facing the same issue when trying to build a sagemaker serving tenserflow image. adding monkey patch to python_service.py at the very top helped me.
import gevent.monkey
gevent.monkey.patch_all()
this was suggested in stackoverflow thread here: https://stackoverflow.com/questions/45425236/gunicorn-recursionerror-with-gevent-and-requests-in-python-3-6-2
Thanks for the suggestion. I had tried this fix but it didn't resolve the issue. I mentioned it here on stackoverflow

You don't clarify it, but have you added it to your model inference code or you built the sagemaker image with it? didn't work for me when I tried it on the inference code. It has to happen prior any other python import happens.

HAdamCode · 2024-07-17T23:05:45Z

@deepblue-phoenix Would you mind sharing your workaround for this issue?

Does anyone have a solution or a timeline on this?

MustaphaU · 2024-07-18T06:47:06Z

@deepblue-phoenix Would you mind sharing your workaround for this issue?

Does anyone have a solution or a timeline on this?

You could try the suggestions by @shresthapradip or this workaround by @pmaoui if it applies to your case.

marcoleino · 2024-07-19T08:53:33Z

Hello, I am having the same issue on a sagemaker custom inference.py script (attached).

I tried both using gevent.monkey.patch_all() and gevent.monkey.patch_all(ssl=False), but the issue persists. I hope there will be a solution soon.

My inference.py :

import gevent.monkey
gevent.monkey.patch_all(ssl=False)

import json
import numpy as np
from PIL import Image
import io
import logging
import tempfile

import boto3

# Configure logger
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

s3_client = boto3.client('s3')

def open_image(image_data):
    try:
        return Image.open(io.BytesIO(image_data))  # Supports every type of image extension
    except Exception as e:
        logger.error(f"Error opening image: {str(e)}")
        raise

def read_image_from_s3(s3_uri):
    """Load image file from s3.

    Parameters
    ----------
    s3_uri : string
        S3 URI in the form s3://bucket/key

    Returns
    -------
    np.array
        Image array
    """
    try:
        bucket, key = s3_uri.replace("s3://", "").split("/", 1)
        logger.info(f"Parsed bucket: {bucket}, key: {key}")
        
        logger.info(f"Reading image from bucket: {bucket}, key: {key}")
        
        s3 = boto3.resource('s3')
        bucket = s3.Bucket(bucket)
        object = bucket.Object(key)
        response = object.get()
        file_stream = response['Body']
        im = Image.open(file_stream)
        image_array = np.array(im)
        
        logger.info(f"Successfully read image from S3 bucket: {bucket}, key: {key}")
        return image_array
    except Exception as e:
        logger.error(f"Error reading image from S3 bucket: {bucket}, key: {key}, error: {str(e)}")
        raise

def input_handler(data, context):
    """ Pre-process request input before it is sent to TensorFlow Serving REST API

    Args:
        data (obj): the request data stream if images, dict or string if text.
        context (Context): an object containing request and configuration details

    Returns:
        (dict): a JSON-serializable dict that contains request body and headers
    """
    try:
        logger.info(f"Request content type: {context.request_content_type}")

        with tempfile.TemporaryDirectory() as temp_dir:
            logger.info(f"Created temporary directory at {temp_dir}")

            if "image" in context.request_content_type:
                payload = data.read()
                image = open_image(payload)            
                image_array = np.array(image)
                image_with_batch_dim = np.expand_dims(image_array, axis=0)  # Add batch dimension
                # Input format is the same as TF Serving API: https://www.tensorflow.org/tfx/serving/api_rest
                response_payload = json.dumps({"instances": image_with_batch_dim.tolist()})  # tolist preserves the shape [1, 224, 224, 3]
                return response_payload
            
            elif "json" in context.request_content_type:
                payload = data.read().decode('utf-8')
                json_data = json.loads(payload)

                # Assuming the structure of json_data is {"s3_uris": ["s3://bucket/key1", "s3://bucket/key2", ...]}
                s3_uris = json_data.get("s3_uris", [])
                logger.info(f"Received S3 URIs: {s3_uris}")

                images = []
                
                for s3_uri in s3_uris:
                    try:
                        image_array = read_image_from_s3(s3_uri)
                        images.append(image_array)
                    except Exception as e:
                        logger.error(f"Failed to process image from S3 URI {s3_uri}: {str(e)}")
                
                if not images:
                    raise ValueError("No valid images found in the provided S3 URIs.\n Please, provide a json stream with key 's3_uris' and a list of uris as value.")
                
                images_with_batch_dim = np.stack(images, axis=0)  # Stack images to create a batch
                response_payload = json.dumps({"instances": images_with_batch_dim.tolist()})
                return response_payload
            
            raise ValueError(f'{{"error": "unsupported content type {context.request_content_type or "unknown"}"}}')
    except Exception as e:
        logger.error(f"Error in input_handler: {str(e)}")
        raise

def output_handler(data, context):
    """Post-process TensorFlow Serving output before it is returned to the client.
    Args:
        data (obj): the TensorFlow serving response as described here: https://www.tensorflow.org/tfx/serving/api_rest#response_format_4
        context (Context): an object containing request and configuration details
    Returns:
        (bytes/json, string): data to return to client, response content type
    """
    try:
        if data.status_code != 200:
            raise ValueError(data.content.decode('utf-8'))

        response_content_type = context.accept_header
        prediction = data.content
        return prediction, response_content_type
    except Exception as e:
        logger.error(f"Error in output_handler: {str(e)}")
        raise

tim-finnigan · 2024-08-02T23:42:03Z

For those using gevent, there is an issue here being tracked on their side for that: gevent/gevent#1826. This issue appears to be specific to RHEL-based systems. Please note that we do not provide or officially support gevent with our networking setup. Any issues related to gevent will need to be addressed by the gevent team.

PrathameshDa · 2024-08-22T06:28:55Z

I also faced same issue ,But this can be fix using

import gevent.monkey
gevent.monkey.patch_all()

Thanks, Everyone

ayersb · 2024-08-23T17:12:25Z

This thread was helpful for debugging this issue, so I'm posting my team's context and solution to this problem.

We encountered this issue after updating packages in a flask application that uses gunicorn to launch "gevent workers" on python 3.10.

The issue appears to have been caused by gevent monkey patching occurring too late after the application python process was started. Gunicorn itself has a built in warning log for this that looks like

/usr/local/lib/python3.10/site-packages/gunicorn/workers/ggevent.py:38: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/usr/local/lib/python3.10/site-packages/urllib3/util/ssl_.py)'

We'd seen this warning in the past without it causing problems, but with newly updated packages we ran into this issue when downloads files from s3 using boto3.

Two ways to fix this. One was to follow the advice in this close gunicorn github issue and NOT use a gunicorn.py config file and instead pass configs as params to the gunicorn process in our entrypoint script. The solution we ended up going with was to monkey patch gevent at the start of our config script, which we didn't previously realize ran in the same python process as the workers.

import gevent.monkey

gevent.monkey.patch_all()
# Monkey patching need to happen here before anything else.
# Gunicorn automatically monkey patches the worker processes when using gevent workers.
# But the way it does this does not strongly guarantee that the monkey patching will
# happen before this file loads, which can cause issues with core libraries like SSL.
import multiprocessing  # noqa: E402

Outside of the gunicorn I think there are two paths to try to debug this:

Path 1) You know you're already using gevent to monkey patch

Make sure gevent monkey patching happens BEFORE ANYTHING ELSE. That includes any other package imports. You may need to ignore lint rules.
Make sure what you think is the entrypoint for your application actually is the entrypoint. If some other file loads first the patching may not work correctly. You can try to debug this by adding a print or log line for dir() at the very top of your application. eg

$ python
Python 3.10.13 (main, May 16 2024, 15:17:11) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
>>> import multiprocessing
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'multiprocessing']

Path 2) You don't think you're using gevent at all.

Try running the snippet of code below to verify that some 3rd party application isn't using gevent without your knowing it. If something is, search your libs for whatever is causing the problem and either replace the problematic library, try putting it at the very top of your imports, or run gevent monkey patching yourself before importing ANYTHING
If you're definitely not using gevent at all, then some other bug entirely is causing this issue with boto3

from gevent.monkey import is_module_patched
...
# Place this where it makes sense for your application
if is_module_patched("socket"): # Socket will VERY LIKELY be patched by any lib using gevent
    raise RuntimeError("Gevent was already monkey patched")
else:
    logging.info("Gevent was NOT monkeypatched")

MustaphaU added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Mar 22, 2024

RyanFitzSimmonsAK self-assigned this May 7, 2024

RyanFitzSimmonsAK added investigating This issue is being investigated and/or work is in progress to resolve the issue. s3 p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels May 7, 2024

RyanFitzSimmonsAK added response-requested Waiting on additional information or feedback. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels May 14, 2024

github-actions bot removed the response-requested Waiting on additional information or feedback. label May 15, 2024

RyanFitzSimmonsAK added the response-requested Waiting on additional information or feedback. label Jun 13, 2024

github-actions bot removed the response-requested Waiting on additional information or feedback. label Jun 15, 2024

deepblue-phoenix mentioned this issue Jun 25, 2024

arcgis package breaks boto3? #3912

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RecursionError: Maximum recursion depth exceeded #4061

RecursionError: Maximum recursion depth exceeded #4061

MustaphaU commented Mar 22, 2024

avishwanathan88 commented Mar 30, 2024 •

edited

Loading

RyanFitzSimmonsAK commented May 14, 2024

MustaphaU commented May 14, 2024 •

edited

Loading

MustaphaU commented May 15, 2024

pmaoui commented May 16, 2024 •

edited

Loading

shresthapradip commented May 24, 2024

shresthapradip commented May 24, 2024

shresthapradip commented May 24, 2024

RyanFitzSimmonsAK commented Jun 5, 2024

RyanFitzSimmonsAK commented Jun 13, 2024

MustaphaU commented Jun 14, 2024 •

edited

Loading

deepblue-phoenix commented Jun 14, 2024

RyanFitzSimmonsAK commented Jun 20, 2024

deepblue-phoenix commented Jun 25, 2024

BadSergey87 commented Jun 27, 2024 •

edited

Loading

MustaphaU commented Jun 30, 2024

BadSergey87 commented Jun 30, 2024

HAdamCode commented Jul 17, 2024

MustaphaU commented Jul 18, 2024 •

edited

Loading

marcoleino commented Jul 19, 2024

tim-finnigan commented Aug 2, 2024

PrathameshDa commented Aug 22, 2024

ayersb commented Aug 23, 2024 •

edited

Loading

RecursionError: Maximum recursion depth exceeded #4061

RecursionError: Maximum recursion depth exceeded #4061

Comments

MustaphaU commented Mar 22, 2024

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

SDK version used

Environment details (OS name and version, etc.)

avishwanathan88 commented Mar 30, 2024 • edited Loading

RyanFitzSimmonsAK commented May 14, 2024

MustaphaU commented May 14, 2024 • edited Loading

MustaphaU commented May 15, 2024

pmaoui commented May 16, 2024 • edited Loading

shresthapradip commented May 24, 2024

shresthapradip commented May 24, 2024

shresthapradip commented May 24, 2024

RyanFitzSimmonsAK commented Jun 5, 2024

RyanFitzSimmonsAK commented Jun 13, 2024

MustaphaU commented Jun 14, 2024 • edited Loading

deepblue-phoenix commented Jun 14, 2024

RyanFitzSimmonsAK commented Jun 20, 2024

deepblue-phoenix commented Jun 25, 2024

BadSergey87 commented Jun 27, 2024 • edited Loading

MustaphaU commented Jun 30, 2024

BadSergey87 commented Jun 30, 2024

HAdamCode commented Jul 17, 2024

MustaphaU commented Jul 18, 2024 • edited Loading

marcoleino commented Jul 19, 2024

tim-finnigan commented Aug 2, 2024

PrathameshDa commented Aug 22, 2024

ayersb commented Aug 23, 2024 • edited Loading

avishwanathan88 commented Mar 30, 2024 •

edited

Loading

MustaphaU commented May 14, 2024 •

edited

Loading

pmaoui commented May 16, 2024 •

edited

Loading

MustaphaU commented Jun 14, 2024 •

edited

Loading

BadSergey87 commented Jun 27, 2024 •

edited

Loading

MustaphaU commented Jul 18, 2024 •

edited

Loading

ayersb commented Aug 23, 2024 •

edited

Loading