@TC A good way to test this scenario is to use the Azure Speech studio with a multilingual voice. Here is a screen shot of my sentence from the studio. Navigate to audio content creation tool from speech studio home page and you will be able to test this with any of the voices.
There is an option to select the SSML view of the above sentence and this confirms the correct SSML tags that can be used to get the required voice output. For the above sentence this should be the SSML
<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US"><voice name="en-US-AndrewMultilingualNeural"><lang xml:lang="en-US">This is a test to check English and Spanish language pronunciation with a multilingual voice. This is 2 in english.</lang><lang xml:lang="es-ES"> Este es el 2 en español</lang>.</voice></speak>
The last part of the sentence, "Este es el 2 en español" which is in spanish has a 2 and this is spoken in spanish in my audio.
You could try the above format in your SSML and check if the same works.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.