Azure OpenAi Semantic Caching is generating unique cacheKeys for identical inputs

Mikies Copoulos 10

Hello,

I'm asking about a blocking issue I've run into when trying to use semantic caching for Azure. I've configured the semantic caching policy per the documentation here: https://zcusa.951200.xyz/en-us/azure/api-management/azure-openai-semantic-cache-store-policy.

All of the endpoints return chat completions and embeddings correctly. The Azure Cache for Redis looks to be successfully added as an external cache in the APIM instance. However, it is currently impossible to get any cache hits at all, as the cacheKey used in the lookup appears to be different for identical inputs. Specifically, it would seem that there is some UUID that is the last thing being concatenated to form the cacheKey, and it is unique on every call I try no matter what.

In the trace of a response in APIM, there are no errors visible. The backend-service returns a 200 response, indicating that it is successfully retrieving the embeddings array for the user input. After the response, given a cache miss, a message is returned indicating that the input will be added to the cache following the end of the output's stream.

Regardless, the cacheKey is different each time, when the embeddings array returned appears to be identical each time. For example, given two requests with identical inputs, two keys might end with:

... :ChatCompletions_Create.16166:8::ef6f22a4-cd77-4db1-93e8-4bd8dfc88822 ... :ChatCompletions_Create.16166:8::b2373bc6-dfba-442f-840c-141129310d3f

Unfortunately, I couldn't find any documentation online concerning how the cacheKey is formed in the OpenAi semantic caching policy, specifically for what this UUID might represent.

Is there an error somewhere that isn't obvious? What might be happening here? Has anyone ever encountered anything similar?

MichalChmel-5394 10 Reputation points

2024-09-24T12:40:35.58+00:00

I am seeing the exact same behaviour. Redis Cache enterprise with RedisSearch. I am able to see that the cache is in fact used but all prompts result in a miss.

Tested with gpt-4.

1 answer

Deleted

This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Comments have been turned off. Learn more

Share via

Azure OpenAi Semantic Caching is generating unique cacheKeys for identical inputs

1 answer

Your answer