hi Sam
The first thing comes to my mind is which of the above API call SK actually decided to invoke. Worth turning on the tracing of SK and see the API calls in the log to be 100%.
1)if SK uses the GetRecordAsync(int id), that potentially means that SK could be calling 1000 times for each record into OpenAI endpoint. so you are likely be limited by number of https request rather than total token consumed. be aware, the openai rate limits both request count and total tokens. if this happens, you might force SK to use the bulk one instead.
2)if SK uses the GetRecordAsync(), the SK would just send all 1000 records in one go to OpenAI endpoint with all the text (assuming it fits the context length. worse case might just be splitting into 2-3 calls). in this case, the request count likely is not an issue but total tokens consumed could be rate limited.
3)continue with 2), the OpenAI needs to consume all the text records and summary them to answer your question, so you can't really reduce total tokens for a single user question. What you might be able to do is to take advantage of token caching if the exact same user prompt is asked.
4)wont think fine-tuning is going to help with your example. your scenario is just a RAG, pulling live data set for openai to produce a summary. fine-tuning is more to change the default behavior of LLM to better answer your questions in terms of the approach in a very specific domain.