Optimize SQL Data Explorer queries

As your game becomes increasingly successful and your player base grows, you might find that even simple SQL Data Explorer queries can take significant time to run.

See the following basic query:

SELECT count(distinct event_name, user_id)/count(distinct user_id) as "avg number of event types per user"
FROM EVENTS
WHERE event_date>current_date-7

Trade query accuracy for speed by using (mod)hash

With a large dataset, this might take a while to run. However, since user_ids are stored as a hash, you can rewrite your code as follows:

SELECT count(distinct event_name, user_id)/count(distinct user_id) as "avg number of event types per user"
FROM EVENTS
WHERE event_date>current_date-7
AND mod(hash(user_id),100) = 63

Here, we’re splitting our users into 100 pseudo-randomly assigned and numbered buckets, looking at bucket number 63. As we increase computational complexity, filtering data in this way becomes more useful, saving time.

Trade query accuracy for speed by rounding

A simple way to improve query speed without 100% accuracy is to use approximate_count_distinct. Our previous query would become:

SELECT approximate_count_distinct(event_name, user_id)/ approximate_count_distinct(user_id) as "avg number of event types per user"
FROM EVENTS
WHERE mod(hash(user_id),100) = 63

Use smaller objects

We can improve many queries by using smaller objects.

Look at the SQL Data Explorer tables page to see what tables are available to query. These are available out-of-the-box with UGS:

  • EVENTS
  • USERS: holds a single row per player alongside their lifetime metrics in the game.
  • FACT_USER_SESSIONS_DAY: includes data on each session for each player.
  • FACT_EVENT_TYPE_USERS_DAY: consists of a row for each event that a player has sent each day, along with a total count.
  • FACT_WAU_USERS and FACT_MAU_USERS: include profile data for users who have played within the previous week or month on a given day.

Between FACT_EVENT_TYPE_USERS_DAY and FACT_USER_SESSIONS_DAY, you can probably answer 80%+ of most queries on smaller objects.

For example, to track mission fail rates, we could use the FACT_EVENT_TYPE_USERS_DAY to calculate overall failure rates each day, with the NUMBER_OF_EVENTS count stored in this table.

SELECT event_date
, round((SUM(case when event_name='missionFailed' then number_of_events else null end) / SUM(case when event_name='missionStarted' then number_of_events else null end))*100) as overallFailRate
FROM FACT_EVENT_TYPE_USERS_DAY
WHERE event_name in ('missionStarted','missionFailed')
AND event_date>current_date-30
GROUP BY event_date
ORDER BY event_date