Discussion and demonstration of an architecture that knits several pieces of Solr’s infrastructure together, with further detail into Solr’s new Time Routed Aliases (TRAs). The system is a machine learning system based on a non-parametric regression methodology taken from habitat ecology. The model is partially pre-calculated and stored in Solr so that it can can be assembled on the fly to recommend what documents a user may be interested in based on recent data. The definition of “recent” is defined by a Solr filter query. Solr TRAs are used to help scale and sunset old data from the system. Technologies discussed in this talk include predictive modeling, Solr streaming expressions, indexing with JesterJ, and Solr Time Routed Aliases (TRAs). The latter half of this presentation goes into some depth regarding TRAs,. TRAs are useful for avoiding performance degradation due to index growth in systems based on continuously acquired timestamped data (similar to the system presented). Both presenters helped build Solr’s TRA capability.
Patrick (Gus) Heck is the Owner of Needham Software LLC and has been solving search problems since 2010, been an independent Solr Consultant since 2012, and a frequent contributor to the Apache Solr project since 2013.