-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major
-
Component/s: Cloud Integrations
-
None
The Cloud Integration poll/rake import tasks import "general" datum, which has to convert each datum into a "stream" datum and maintain the stream's property metadata as every datum is persisted. SolarNode uses a more efficient approach that compares each new datum against its stream metadata, and if the stream metadata is compatible with the datum, converts the "general" datum into a "stream" datum and uploads that. Thus it "falls back" on uploading the "general" datum only when the stream metadata does not exists or is not compatible (i.e. missing a property found on the new datum).
The Cloud Integration poll/rake import process could use a similar approach to SolarNode, as most of the time the stream metadata will not change, and checking to maintain it is expensive and slow. That is, if a given datum is compatible with its current stream metadata, convert each "general" datum into a "stream" datum and persist that instead.
Another aspect is the tracking of "stale" aggregate hours during the task. The "normal" flow for a task only inserts a small number of datum, as it tracks just recently added datum over time. For a paused/restarted task, however, a larger number of datum over a long period of time might need to be imported, and the current process tracks "stale aggregate hours" for every datum imported, even when there are multiple datum/hour.
The Datum Import process takes a more optimized approach for "stale hour" tracking: it defers inserting stale hour records until the end of the import process, keeping track of the changed hours internally and marking those as "stale" at the end of the import task. The poll/rake task could take a similar approach: defer tracking of "stale" hours until the end of the import.
The last aspect to also consider is the datum/property import audit tracking. At the moment audit tracking is handled as each datum is persisted. It might be possible to defer this tracking until the end of the import, and update the audit data at the end as one update per stream.