nerdmon on Fri, 21 Jun 2019 19:31:00
I have a tabular AAS cube with 70,000,000 rows and 206 columns. It has 21 partitions of 3-6 million rows each. I've modified the data source to run all partitions in parallel for processing. The source data is a materialized table in Azure SQL Data Warehouse that is DW1000c tier.
When I run a refresh of this AAS cube, it takes 1.5 - 2 hours and requires a S4 tier to accommodate a 65gig size staging in memory during processing.
However, no matter how many partitions or concurrency I use, I cannot seem to improve the processing time, as the rows read caps out at about 12k rows/sec.
Can anyone advise as to how to improve this processing duration? I am shocked to see that a process from AAS to SQL Data Warehouse both in US West Azure can take that long for a tabular cube that is only 20 gigs in size.
Stuartb113 on Mon, 24 Jun 2019 12:45:04
A couple of ideas come to mind -
* Are the columns on the whole dimension key fields? If not could the table be dimensioned?
* Could you change your partition processing methodology to only process an 'Active' partition on an ongoing basis? i.e. Do not continually reprocess the other 20 partitions if the data is not changing
nerdmon on Mon, 24 Jun 2019 16:01:09
* It is a denormalized view with dimensions because the users did not want two separate sections to search.
* The data is changing daily unfortunately.
Stuartb113 on Wed, 26 Jun 2019 11:48:57
A super table or denormalised view approach will certainly inherit daily row changes. With a dimension model you would have flexibility in the processing of tables and unless the Fact is aggregated, shouldn't have to reprocess historic fact records that have been updated.
You could then develop a report that allows users easy Filtering of data by the dimension attributes.
my other suggestion would be to refer to -