Thursday, 14 May 2015

Search across Solr Cores

Solr Findings: Multi Core, Multi item search in Solr

Below listed some features which will be helpful while working with Solr.

1) Searching across all cores: There are some cases where we need to search across multi cores environment in Solr. Using shards, we can enable this feature.
Solr's feature 'shards' split huge indexes to make the search faster. Cores can be treated as shards for an 'All core search'. Some of the samples are given below Say we have two cores (us_en,es_en).
The both cores can be queried using below parameters
http://localhost:8983/solr/us_en/select?shards=localhost:8983/solr/us_en,localhost:8983/solr/es_en&indent=true&q=*:*&df=title

Here df is the default query field and we are doing a '*' search.


2) For multi item search: Let us say we need to search multiple items using same query. We can do a multi parameter search to a core as shown below.

http://localhost:8983/solr/es_en/select?q=*:*&df=id&wt=json&indent=true& fq= (id:”id1” | id: “id2” | id: “id3”)

Tested URL as shown.
http://localhost:8983/solr/us_en/select?q=*%3A*&df=id&wt=json&indent=true&fq=(id:"id1" | id:"id2" | id:"id3")

Friday, 20 February 2015

Security precautions for Solr on Dispatcher

Things to be taken care while configuring Solr on Dispatcher:

Solr is a tool which can be accessed using direct URLs. If we miss to block the UI access, it can be a vulnerability threat for the application. Also we should block queries with <delete> for a tension free operation.

How rules are created in dispatcher for security?

There are some default security rules enabled for dispatcher and some may be as below,

RewriteCond %{QUERY_STRING} ^.*(localhost|loopback|127\.0\.0\.1).*                     [NC,OR]
RewriteCond %{QUERY_STRING} ^.*(\*|;|<|>|'|"|\)|%0A|%0D|%27|%3C|%3E||%7C|%26|%24|%25|%2B).*         [NC,OR]
RewriteCond %{QUERY_STRING} ^.*(;|<|>|'|"|\)|%0A|%0D|%27|%3C|%3E|).*(/\*|union|select|insert|cast|set|declare|drop|update|md5|benchmark).* [NC]

We need to re-write them in such a way the query is not blocked except update/delete operations.

How delete by id and delete all should be prevented from dispatcher?

Delete by id from solr core en_US

If id is ‘/content/project/us/en/Home/testarticle.html’
Invoking below URL will deleted the id and all its records from index.
http://localhost:8983/solr/en_US/update?stream.body=<delete><query>id:"/content/project/us/en/home/testarticle.html"</query></delete>&commit=true

Delete All from a solr core en_US
Invoke below url so that all data will be deleted from index for specific core en_US. But think twice before executing this command, because it delete *ALL*.
http://localhost:8983/solr/en_US/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true

How publish avoid listing of full solr server url?
Use '/solr' in page/component which uses solr query for dispatcher and there must be some rewrite rule which appends dispatcher url to /solr. Thus we can hide the solr server url from dispatcher.

How to search in entire solr fields for a query?

How to search in entire solr fields?

We have many levels of configurations for Solr, which makes Solr a rich search tool. Usually we do search the Solr with respect to a specific field which is defined in schema.xml. But there are cases we need to search across multiple fields. Let us see how it can be achieved.

There are two ways to do this.

1. Using DisMax: Usually Solr comes with dismax plugin. So in query, we just need to pass all fields in qf field as shown below.

/select?defType=dismax&q="query1","query2","query3"&qf=field1 field2 field3

In above case we are searching 3 terms query1,query2,query3 (added in inverted commas to ensure words with space fetch matching results)
field1,2,3 are the fields in schema.xml to be searched.

2. Another way is collecting all data to same field by copying through schema.xml

We need to have below lines in schema files,

<field name="datacollection" type="text_general" indexed="true" stored="false" multiValued="true"/>

Then copy the contents of required fields to the new field

<copyField source="field1" dest="datacollection"/>
<copyField source="field2" dest="datacollection"/>
<copyField source="field3" dest="datacollection"/>

Then query in default field datacollection.

JSON response from Solr

Let us see how to generate JSON format from Solr search result.

Required Configuration in Solr to get the desired JSON format:

There are cases we need desired JSON format other than the usual format provided by SOLR. Let us see how to get a desired JSON formatted output from Solr.

Steps:

1)
Below line to be added in solrconfig.xml (if multi core configuration, add below line in solrconfig.xml of each core)
<queryResponseWriter name="xslt" class="org.apache.solr.response.XSLTResponseWriter"/>

2) Given below a sample JSON response writer. Save it in your /core/conf/xslt folder.

3)
The query should have ‘&wt=xslt&tr=json.xsl’ appended to invoke the required JSON format.

Sample JSON.xsl

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text" indent="yes" media-type="application/json"/>
  <xsl:variable name="count" select="response/result/@numFound"/>

  <xsl:template match="response">
    <xsl:text>{"relatedContent":{"totalRecords": <xsl:value-of select="$count"/></xsl:text>  <xsl:text>,"results":[</xsl:text>
    <xsl:apply-templates select="result/doc"/>
    <xsl:text>]}}</xsl:text>
  </xsl:template>

  <xsl:template match="result/doc">

    <xsl:text>{"title":"</xsl:text><xsl:apply-templates select="str[@name='title']"/>  
    <xsl:text>","imageReference":"</xsl:text><xsl:apply-templates select="arr[@name='imageReference']"/>
<xsl:text>","imageAltText":"</xsl:text><xsl:apply-templates select="str[@name='imageAltText']"/>
<xsl:text>","firstName":"</xsl:text><xsl:apply-templates select="str[@name='firstName']"/>
<xsl:text>","lastName":"</xsl:text><xsl:apply-templates select="str[@name='lastName']"/>
    <xsl:text>"}</xsl:text>
<!-- append comma to all elements except last element-->
     <xsl:if test="position()!=last()">
       <xsl:text>,</xsl:text>
      </xsl:if>

   </xsl:template>
   

</xsl:stylesheet>

Integration of Solr with AEM/CQ + Zookeeper- better designs

Here we are trying to explain the design approaches when we integrate Solr search with AEM

Solr with CQ : Better designs

When we are in designing phase of Solr search with CQ, we have different approaches. One of the better approach is explained below.

How many Solr in production environment?
For the System to handle abundant requests and fail safe condition met, we need to have minimum two Solr instances set up and a load balancer on it. So when ever there is a request the load will be balanced and request will be served.



How the update works?
For updating both Solr through load balancer, we need to have Apache ZooKeeper configured(http://zookeeper.apache.org/). Zookeeper helps us to serve the configurations across server.

Can we have xslt transformation for JSON through this design?
Yes. When we need JSON response from Solr, we have to use our own xslt files for transformation. This can be kept as usual inside Solr deployed server (For eg : JBoss).

My changes on xslt not appearing in server. How to fix?
After xslt changes, some times the server cache xslt files. To get the xslt files refreshed, we need to follow below steps.
  • Stop JBoss
  • Stop ZooKeeper
  • Clear temp folders of JBoss
  • Start JBoss,ZooKeeper