Person: In Looker, caching is a useful feature that LookML developers can use to reduce database load and optimize query performance. Caching leverages the saved results from previously executed queries so that the same query does not need to be run on the database each time. The overall caching process in Looker is straightforward and begins with a query. As users are exploring and analyzing data, Looker generates SQL queries and check whether there are valid cached results for each query. If there are valid cached results for a query, Looker will avoid requesting the same data again from the database and instead simply send those cached results back to the user. If there are no valid cached results for that query, Looker sends the query to the connected database. The new SQL results are then cached and stored in an encrypted file on the Looker instance. This file can then be used by Looker if and when the same query is run again. Now to help ensure that cached results remain valid and up-to-date, LookML developers can set up data groups or caching policies to manage the frequency and conditions for caching on a Looker instance. For example, a data group can be created to ensure that all cached results are updated at least once an hour or when a new ID is added for a key field such as user ID or order ID. The data groups are then used by Looker to check whether cached results are still valid. If the cached results are no longer valid per the data groups, Looker will send the query to the database to obtain new results. In Looker, data groups can be defined both to establish caching policies and rules for entire models, individual Explores or specific PDTs in your LookML projects and to ensure a refreshed cache. The number and types of data groups you would need to create depends on how often your data is updated and should take into account your organization's data extraction, transformation and loading or ETL process and business requirements. Let's discuss how to define a data group. To define a new data group using LookML, you must provide a unique name and two parameters, max_cache_age and SQL_trigger. The first parameter, max_cache_age, specifics the maximum number of hours to keep a cached result, such as 24 hours. The other parameter, SQL_trigger, is used to write a select statement that can tell Looker whether the results have changed. The SQL_trigger should be written to return only one value such as the maximum ID value in a table. Looker will send this statement to the connected database on a regular frequency. Once it finds that the value has changed, Looker takes that as a cue to refresh the cache. Of course, while only one of these parameters is required, we do recommend as a Looker best practice using both to achieve the desired caching results. For example, if no change is detected by the SQL_trigger check, that could mean something went wrong with the database ETL process or the SQL_trigger itself. By including a max_cache_age, the cache would still get refreshed regardless after a set duration. The frequency of the SQL_trigger's pinging of the database is defined in the connection settings by your Looker administrator. The frequency is determined by a chron string in the connection's PDT and data group maintenance schedule field. By default, it is every 5 minutes. The frequency of SQL_trigger check should match the approximate frequency of the data updates. For example, every 5 minutes is too frequent if your data warehouse is only updating every few hours or once per day. In addition, companies using certain database dialects may want to let the database hibernate outside of work hours to save money, so they would not want Looker waking up the database with SQL_trigger checks. Be aware, though, that in Looker, defining a data group by itself doesn't do anything. It is a two-step process. After defining the data group, you need to apply the data group to a LookML object. For example, you can use the persist_with parameter to apply a data group at the model level. When you do this, Looker will apply the same caching rules to all Explores within this model. In fact, whenever you create a new LookML project by having Looker generate the model from the database schema, Looker will automatically create a default data group in the model file that you can customize as needed. You can also choose to apply a caching policy on an individual Explore which would override whatever it is set at the model level. For example, to apply a data group to a specific Explore, use the persist_with parameter within that Explore's definition rather than at the model level. To apply a data group to a specific set of Explores but not all Explores in a model, use the persist_with parameter in each Explore's definition and specify the same data group name. Since Explores are the foundation for all content in Looker, the same caching logic would carry over to looks and dashboards created from the Explore. You can also use data groups to tell Looker when to rebuild a persistent derived table or PDT. To do this, simply specify the data group name in the datagroup_trigger parameter of the PDT. While there are a few different options for persisting derive tables in Looker, using the datagroup_trigger parameter is the recommended best practice to ensure that the data remains current, so if you ever have created a persistent derive table or PDT in Looker, then you may have already used a data group. Additionally, schedules for looks and dashboards can also be run on data groups. You can instruct Looker to run a look or a dashboard automatically upon expiration of the caching policy, so new data is retrieved and precached for any business users who need it. Please remember, though, that if your database connection is configured in Looker to use dynamic user names such as OAuth for BigQuery, then you cannot use data groups for models using that connection. Instead, use a persist_for parameter to cache Explore queries for a fixed amount of time. In addition, remember that when using OAuth for BigQuery, persistent derive tables are not supported. More information on dynamic user names can be found in the Looker documentation on OAuth for BigQuery connections. In summary, every time that a user runs a query, Looker checks to see if that query has been run before. If it has not been run before, Looker runs the query on the database and then caches the results for future use. If the query has been run before, Looker then checks the caching policy to evaluate whether the results should still be considered valid. If the cached results are still valid, Looker returns the cached results to the business user. If the same query has been run before, the results are no longer valid per the caching policies. Then Looker sends the query to the database to get new results. It then caches the new results for future use. With its efficient caching policy and varied options, Looker puts the power into your hands as a LookML developer to optimize queries and reduce database load for your organization's Looker instance.