Migrate data from cosmos to TSI

Category: azure time series insights

Question

AxiansMarcH on Fri, 03 Apr 2020 11:21:04


We are investigating how we can migrate a customers data to the TSI. This is 2 years of data stored in cosmos in time series, and we are moving to our new architecture where we use TSI. 

anyone any tips?

Replies

António Sérgio Azevedo - MSFT on Thu, 09 Apr 2020 23:23:17


Hello AxiansMarcH ,

Thank you for your question!

If I understood correctly you want to extract data from Cosmos DB and ingress it to TSI making sure that when extracting from Cosmos it outputs the sorted list of documents that were changed in the order in which they were modified?

Please share some more details of your scenario and what you already tried to do. Sharing also with you the following docs if you didn't look at them before:

https://docs.microsoft.com/en-us/azure/time-series-insights/time-series-insights-update-storage-ingress


AxiansMarcH on Fri, 10 Apr 2020 09:44:10


Hi Antonio,

Thank you for your answer. We are able to pull the data out of cosmos. But the challenge is that we have 2 years of IoT data in there with around 250k messages each day. The standard TSI is capped on 1 million messages per day. So it will be a very long migration or we can scale up and it wil be very costly. 

Can't we write the data to the blob storage directly? so that TSI can understand it has the data? or a other way?

thank you!

Marc

António Sérgio Azevedo - MSFT on Mon, 20 Apr 2020 06:32:38


Hello Marc,

Let me check on alternatives with TSI team. Thanks for your time in advance!

António Sérgio Azevedo - MSFT on Tue, 28 Apr 2020 12:07:04


Hello Marc, thank you for your patience.

I have confirmed that the only option today is that you scale up if you wish to stay on the Standard SKU. As you didn't import historical data yet, can you consider using PAYG instead? PAYG will be GA this year and there is no throttling in PAYG.

- You can nevertheless experience significant lag when doing this type of bulk import (on PAYG), thus, we recommend that you set your IoTHub (viar ARM template/CLI) or EventHub retention time to the max of 7 days to have plenty of buffer. Also, we always encourage our white glove customers that we talk to regularly to let us know ahead of time that they'll be doing a historical import via stram so that or DRI can be notified and ready to move their environment to a different cluster if CPU and lag cross a certain threshold.

Also if your intention is to keep the 2 years of historical data for a long time, then PAYG is probably the right option as it has cold storage in Azure Blobs. If you only want to bring the data in for analysis then you can also choose to scale up S1... 

Note that there are some differences in PAYG as compared to the S1/S2 SKU's, e.g. the query API is different, so you should POC on PAYG to ensure it works for your scenario.

Thank you so much hope that helps!

In case the information in this post is helpful , please feel free to mark this response as answer so that it can help others searching for similar questions.