Jump to content

Solr

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
This page contains historical information. It may be outdated or unreliable.

Solr is a Lucene-based search engine. WMF used it for translation memory and spatial searches, but switched to Elasticsearch.

Puppet

modules/solr + manifests/role/solr.pp

Operations

How to poke it with a stick

Checking if it's OK:

maxsem@fenari:~$ curl http://solr1001:8983/solr/admin/cores?action=STATUS
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">2</int></lst> (loads of XML)

Checking if search works:

$ curl 'http://solr1001:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on'
(loads of XML)

Checking replication status

$ curl http://solr1001:8983/solr/replication?command=details

Get schema remotely:

$ curl 'http://solr1001:8983/solr/admin/file/?contentType=text/xml;charset=utf-8&file=schema.xml'

Restart

# service jetty restart

Upgrading schema

  • ☠Delete existing documents☠:
$ curl 'http://localhost:8983/solr/update?commit=true&stream.body=%3Cdelete%3E%3Cquery%3E*%3A*%3C%2Fquery%3E%3C%2Fdelete%3E'
  • Stop the server:
# service jetty stop
  • Update schema.xml e.g. by forcing a puppet run:
# puppetd -tv
  • Reindex the data

Logs

  • If jetty was restarted, it moves today's logs to *.<some number>

In /var/log/jetty:

  • request.log - requests
  • stderrout.log - errors and debug information

These logs are prefixed with "year_month_day.".

Examples:

  • 2012_12_19.stderrout.log.100139764
  • 2012_12_20.request.log

GeoData

Disabling updates

GeoData is updated every 30 minutes by cronjob on terbium. You can disable it by creating /tmp/disable-update-geodata on that server. Deleting the file will reenable updates.

Switching master

  • Disable updates (see above). Wait some time in case update cronjob was already running.
  • Change replication_master in mainfests/role/solr.pp. When deploying this change, it's preferrable to start with the new master.
  • Update $wgGeoDataSolrMaster and $wgGeoDataSolrHosts in CommonSettings.php. Don't forget to reduce the weight on new master to avoid spikes during replication being too high.

Packages

We use our own backports:

$ dpkg -l *solr*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                       Version                    Description
+++-==========================-==========================-====================================================================
ii  libsolr-java               3.6.0+dfsg-1               Enterprise search server based on Lucene - Java libraries
ii  solr-common                3.6.0+dfsg-1               Enterprise search server based on Lucene3 - common files
ii  solr-jetty                 3.6.0+dfsg-1               Enterprise search server based on Lucene3 - Jetty integration