Error: connection closed

Description

Intermitently we are getting “Connection closed” for an error, this error means the socket was closed earlier than expected (ie. we had not received the result of our request).

Example request: https://libraries-dcb-hub-admin-scaffold-uat-git-production-knowint.vercel.app/patronRequests/3c333e04-de0c-4d72-8926-232a46279941

Side effects:

  1. As this could happen at anytime the side effects could be almost anything depending on where it occurs.

Investigation results so far:

  1. For the ones looked into so far, it has always occurred when we are trying to create the virtual item and place a hold on the borrowing agency

Further Investigations required:

  1. Are we piggy backing requests on an already existing socket and if so can we disable the cache for a few days to see if we can go for a period without this occuring, the reason I suggest this is that some of the LMS systems we talk to are ancient and may very well close the connection after returning the result to ys, which if we are grabbing a connection from the pool will not find out about until we attempt send a request down the connection

  2. Add logging so we know the URL, Request Body (if any) and the response body for when the error occurs

  3. Add a report about which sites it is happening against, to see how wide this problem is

  4. Look to see if it feasible when this occurs to see if the request being performed was permformed or not and if it wasn’t try the request again

  5. If it is determined this is only occuring for specific sites or LMS, look and if disabling the cache show no more instances of this (currently we have at least 1 a day) and the report showed it is only occuring for specific sites / LMS investigate if it is feasible to disable those sites / LMS from performing connection cacheing.

Attachments

5

Activity

Show:

Chas Woodfield August 23, 2024 at 1:49 PM
Edited

Have updated the script for capturing those where the connection was closed as they do not all got to the error state, it makes it look a lot worse than it is as there must be a scenario which we have that is forcing it everytime we track certain request.

So attached is the latest

 

Jag Goraya August 21, 2024 at 4:09 PM

Since records began, CALS accounts for 23 out of 49 (47%).

In the last ~ month they are party to 14 out of 20 (70%) of affected requests.

Since the config was changed, they are party to 5 of 7 (71%) of affected requests, and Missouri River are party to the other 2 (29%).

The volumes aren’t huge, but the ratios are disproportionate for their experience. I’m curious whether re-enabling the pool causes a regression.

Chas Woodfield August 21, 2024 at 3:55 PM

Attached an updated list of requests

 

Jag Goraya August 13, 2024 at 8:59 AM

  • Majority of these are occurring during authentication, and on Polaris/Sierra standalones only

  • TODO:

    • add a retry during authentication

    • review whether MICRONAUT_HTTP_CLIENT_POOL_ENABLED is still a valid setting /cc

Carole Godfrey August 8, 2024 at 4:50 PM

Confirming that setting MICRONAUT_HTTP_CLIENT_POOL_ENABLED=false is still in place on the task definition

Is there anything additional required for the setting or verification that can be made on the running task for the setting


Details

Assignee

Reporter

Components

Environment

Production

Fix versions

Sprint

Priority

Created June 27, 2024 at 1:53 PM
Updated October 17, 2024 at 10:45 PM