-
Type:
New Feature
-
Resolution: Fixed
-
Priority:
Major
-
Component/s: DB, SolarQuery
-
None
Currently SolarQuery API methods like /datum/list and /datum/reading allow providing a list of node IDs and a list of source IDs as criteria. These criteria are basically passed to SQL queries as-is, so that all combinations of the given node and source IDs can match. For example:
SELECT * FROM datum WHERE node_id IN (1,2,3) AND source_id IN ('foo', 'bar')
This can result in a slower-than-expected execution time, because the query is essentially looking for a cross product of possible matches, of the number of node IDs times the number of source IDs. The longer the list of node IDs or source IDs becomes, the larger and slower the query becomes. It is especially bad when some source IDs only appear with some node IDs, and the query is forced to look for a set of data that will never match.
It might be possible to optimize the queries if instead of a cross product of possible matches, we instead specified an exact mapping of node IDs and associated source IDs, and then query only for those sets. For example, if we know source ID 'foo' can only be found with node ID 1, and 'bar' with node IDs 2 & 3, then we could do this:
SELECT * FROM datum WHERE (node_id IN (1) AND source_id IN ('foo')) OR (node_id IN (2,3) AND source_id IN ('bar'))
That exact query might not be faster, but it could also be split into two parallel queries, and merged, like:
PARALLEL_MERGE( SELECT * FROM datum WHERE (node_id IN (1) AND source_id IN ('foo')) SELECT * FROM datum WHERE (node_id IN (2,3) AND source_id IN ('bar')) )