SOLR AND AEM

Sunday, 16 November 2025

Comparing Search Solutions for AEM as a Cloud Service: Solr vs Coveo vs Algolia

In the world of digital experiences, search is a critical component. For Adobe Experience Manager (AEM) as a Cloud Service, choosing the right search provider can significantly impact performance, scalability, user experience, and total cost of ownership. In this post, we compare three widely-used search technologies—Solr, Coveo, and Algolia—and how they integrate with AEM as a Cloud Service.

🔍 Overview of the Search Tools

Solr: Open-source, flexible, highly customizable search engine built on Apache Lucene.
Coveo: Enterprise-grade SaaS search platform with AI-driven recommendations and deep integrations.
Algolia: Hosted search API with emphasis on speed, relevance, and developer-friendly SDKs.

⚙️ Integration with AEM as a Cloud Service

Solr

Solr can be integrated with AEM using custom connectors, the Sling Resource Resolvers, and indexing configurations. For AEM as a Cloud Service, Solr is typically hosted externally (e.g., on AWS or managed Solr providers) due to the restrictions of service-based architecture.

Pros: Open-source, flexible schema, control over infrastructure.
Cons: Setup and scaling complexity, requires devops effort, no native cloud management in AEMaaCS.

Coveo

Coveo offers an out-of-the-box integration with AEM, supporting both on-prem and cloud environments. With a connector for AEM, Coveo can index content automatically from AEM components and provide AI-powered search and recommendations.

Pros: SaaS, powerful AI/ML features, AEM-specific connector, fast deployment.
Cons: Licensing costs, less flexible than open-source solutions.

Algolia

Algolia integrates with AEM through REST APIs or middleware. It is often used for its speed and developer-friendly experience. It works well with AEM Headless setups and SPA implementations.

Pros: Lightning-fast responses, API-first approach, great developer tooling.
Cons: Pricing based on volume, requires integration coding, less native support for AEM content models.

📊 Feature Comparison Table

Feature	Solr	Coveo	Algolia
Deployment Type	Self-hosted / Managed	SaaS	SaaS
AEM Integration	Custom integration	Native connector	Custom via APIs
AI & Recommendations	Custom plugins	Built-in AI	Optional (via extensions)
Scalability	Depends on hosting	Auto-scaled	Auto-scaled
TCO	Medium (infra + support)	High (enterprise license)	Medium-High (usage-based)

🧠 Which One Should You Choose?

Your choice depends on your priorities:

Solr is ideal if you want full control and flexibility, and your team can manage the infrastructure.
Coveo is excellent for enterprise-grade search with personalization and analytics built-in.
Algolia is great for headless setups and high-speed front-end search experiences.

Final Thoughts

AEM as a Cloud Service provides the flexibility to integrate with modern search services through APIs and connectors. Whether you require speed, ML-driven relevance, or control and customization, there is a solution that fits your organization’s needs. Evaluate based on budget, tech stack, and long-term scalability.

Have you integrated any of these search services with AEM? Share your experience in the comments!

AEM Integration with Apache Solr: A Complete Technical Guide

Adobe Experience Manager (AEM) is a popular enterprise CMS. Integrating AEM with Apache Solr brings distributed indexing, advanced query features, and improved relevancy for large content platforms. This article walks through architecture choices, integration methods, configuration steps, sample code, and production best practices.

Why choose Solr for AEM?

Scalability — SolrCloud supports sharding and replication for large datasets.
Rich query features — faceting, boosting, spellcheck, suggestions.
Performance — optimized for high read loads and complex queries.
Custom scoring — advanced relevancy tuning for enterprise use cases.

Architecture overview

A typical integration places Solr as the external search engine while AEM remains the content source. Content authored and published in AEM is indexed in Solr. The front-end queries Solr for search results and displays them in AEM components or SPA layers.

Author/Publisher AEM  -->  Indexing Pipeline  -->  Solr (SolrCloud)
Front-end (AEM/Public) <-- --="" api="" ervice="" search=""> Solr

Common integration methods

1. Replication Agent / Push-based indexing

Configure a custom replication agent in AEM that sends content to Solr whenever a page is activated. This is a pragmatic approach that hooks into existing authoring workflows.

2. Sling Event Listeners / OSGi Service

Implement an OSGi service or Sling event listener that reacts to resource changes and sends JSON documents to Solr. Provides fine-grained control and transformation logic.

3. Pull-based indexing (REST / Data Import Handler)

Expose a REST endpoint from AEM and configure Solr to pull content on a schedule. Simpler to implement but less real-time.

Solr schema — fields to include

Create a Solr core/collection (e.g. aem-index) and define fields that reflect AEM content model:

id — unique identifier (recommend using the content path)
title, content, description
path — AEM page or resource path
last_modified — for incremental indexing
tags, type, author — for faceting/filtering

Example: Simple Sling Servlet to index one page

Paste this into an OSGi-enabled servlet in AEM (simplified example — production code needs error handling and batching):

@SlingServlet(paths = "/bin/solr/index")
public class SolrIndexServlet extends SlingSafeMethodsServlet {
    @Override
    protected void doGet(SlingHttpServletRequest request, SlingHttpServletResponse response)
            throws ServletException, IOException {
        String path = request.getParameter("path");
        Resource resource = request.getResourceResolver().getResource(path);

        String title = resource.getValueMap().get("jcr:title", "");
        String content = resource.getValueMap().get("jcr:description", "");

        SolrInputDocument doc = new SolrInputDocument();
        doc.addField("id", path);
        doc.addField("title", title);
        doc.addField("content", content);

        SolrClient solr = new HttpSolrClient.Builder("http://localhost:8983/solr/aem-index").build();
        solr.add(doc);
        solr.commit();
    }
}

Querying Solr from AEM

Use the SolrJ client or HTTP APIs inside a custom AEM service to run queries and return structured results to front-end components.

SolrQuery query = new SolrQuery();
query.setQuery("content:experience");
query.addFacetField("type");
QueryResponse resp = solrClient.query(query);

Best practices for production

Use SolrCloud (collections, shards, replicas) for HA and scale.
Design your indexing pipeline for incremental updates (use last_modified).
Index ACLs or implement a permissions filter so search results respect AEM security.
Monitor Solr with metrics and alerts (Prometheus/Grafana + logs).
Implement retries and buffering — network failures between AEM and Solr are common and should be handled gracefully.
Plan for schema evolution and reindexing strategies.

Note: This guide provides a technical overview and starter code. For enterprise-grade implementations, consider batching, bulk-index workflows, schema management automation, and security auditing.

Conclusion

Integrating AEM with Solr opens powerful search capabilities for enterprise content platforms. Whether using push-based replication, an OSGi indexing service, or a mixed approach, the key is designing a reliable pipeline for indexing and a robust query layer for your front-end.

Wednesday, 25 July 2018

FAQ about Solr AEM Integrations

What are all the difference between Solr usage embedded & remote in AEM?

Embedded Solr is recommended only for development purpose. Production search can be implemented using remote/ external Solr. This way it makes the solution more scalable.
If the site has smaller content, embedded could be used. but for larger content, external Solr is recommended.
When we work with external Solr, we have more control over schema, index options, boosting fields, more direct configurations etc.
External Solr is recommended when data/ content from third party applications needs to be indexed than from AEM.

Difference between Solr & Lucene

Lucene is the core search engine and Solr is a wrapper on Lucene. To read data from Lucene we need programs. But Solr provides a UI thus making easier to read indexed data.

Advantages of using Solr search

Below given major advantages of using Solr as indexing/ search engine with AEM.

Quick learning curve.
Horizontal & vertical scaling through Solr Cloud
Clustering through Apache Zookeeper
Rich full text query syntax
Highly configurable relevance and indexing
Plugin architecture for query parsing, searching & ranking, indexing

Coveo Search with AEM

Wednesday, 2 August 2017

Steps to implement a search

This post discuss about the steps to implement search in any applications.

Index of search implementation blog can be found at this location

Set up

Set up of the search engine and indexing options are considered as first step in search implementation.

Hosting: Hosting the search tool.
Indexing: Indexing the data.
Index frequency : Configuring the index. (Daily, Monthly)

Configure
This step includes any configuration related to search.

Metadata - Used for faceting, sorting, ranking, relevance.
Breadcrumbs - for a better navigation.
Pagination - for a better navigation.
Recent Searches - Search assist.
Search suggestions(Did you mean) - Search assist.
Auto complete - Search assist.

Fine-tune
Once search implementation is done,we need to fine tune the search by analysing the results.

Promotion: Normal, Based on search result, we can promote.
Dictionaries : Configure for better results.
Banners : any organizational promotion.
Redirects : Send user to a page.

General considerations to choose your future search platform.

Search Management and Maintenance -
Thinking about previous data migration and future search upgrades.
User experience -
Different user interface for range of use cases.
Content -
Which resides in file repositories, document management system, database or CRM applications.
Classifying and unifying cross system data -
Content processing and defining metadata for new index.
Hybrid Scenarios -
Search data from on premise and cloud.
Result set customization -
Efficient way of displaying results.
Ranking & Relevance -
Fine tuning the order of results.
The migration process -
Provide transparent, solid experience to the end users, Educate the users about new system.

Read More

Steps to implement any search technology
AEM Dispatcher, why it is needed?
AEM Desktop App
Figure out the best search technology or tool
Steps to implement search in Solr
Quality of Search - fine tuning search implementation
FAQ on search implementation

Thursday, 14 May 2015

Search across Solr Cores

Solr Findings: Multi Core, Multi item search in Solr

Below listed some features which will be helpful while working with Solr.

1) Searching across all cores: There are some cases where we need to search across multi cores environment in Solr. Using shards, we can enable this feature.
Solr's feature 'shards' split huge indexes to make the search faster. Cores can be treated as shards for an 'All core search'. Some of the samples are given below Say we have two cores (us_en,es_en).
The both cores can be queried using below parameters
http://localhost:8983/solr/us_en/select?shards=localhost:8983/solr/us_en,localhost:8983/solr/es_en&indent=true&q=*:*&df=title

Here df is the default query field and we are doing a '*' search.

2) For multi item search: Let us say we need to search multiple items using same query. We can do a multi parameter search to a core as shown below.

http://localhost:8983/solr/es_en/select?q=*:*&df=id&wt=json&indent=true& fq= (id:”id1” | id: “id2” | id: “id3”)

Tested URL as shown.
http://localhost:8983/solr/us_en/select?q=*%3A*&df=id&wt=json&indent=true&fq=(id:"id1" | id:"id2" | id:"id3")

Friday, 20 February 2015

Security precautions for Solr on Dispatcher

Things to be taken care while configuring Solr on Dispatcher:

Solr is a tool which can be accessed using direct URLs. If we miss to block the UI access, it can be a vulnerability threat for the application. Also we should block queries with <delete> for a tension free operation.

How rules are created in dispatcher for security?

There are some default security rules enabled for dispatcher and some may be as below,

RewriteCond %{QUERY_STRING} ^.*(localhost|loopback|127\.0\.0\.1).* [NC,OR]
RewriteCond %{QUERY_STRING} ^.*(\*|;|<|>|'|"|\)|%0A|%0D|%27|%3C|%3E||%7C|%26|%24|%25|%2B).* [NC,OR]
RewriteCond %{QUERY_STRING} ^.*(;|<|>|'|"|\)|%0A|%0D|%27|%3C|%3E|).*(/\*|union|select|insert|cast|set|declare|drop|update|md5|benchmark).* [NC]

We need to re-write them in such a way the query is not blocked except update/delete operations.

How delete by id and delete all should be prevented from dispatcher?

Delete by id from solr core en_US

If id is ‘/content/project/us/en/Home/testarticle.html’
Invoking below URL will deleted the id and all its records from index.
http://localhost:8983/solr/en_US/update?stream.body=<delete><query>id:"/content/project/us/en/home/testarticle.html"</query></delete>&commit=true

Delete All from a solr core en_US
Invoke below url so that all data will be deleted from index for specific core en_US. But think twice before executing this command, because it delete *ALL*.
http://localhost:8983/solr/en_US/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true

How publish avoid listing of full solr server url?
Use '/solr' in page/component which uses solr query for dispatcher and there must be some rewrite rule which appends dispatcher url to /solr. Thus we can hide the solr server url from dispatcher.

How to search in entire solr fields for a query?

How to search in entire solr fields?

We have many levels of configurations for Solr, which makes Solr a rich search tool. Usually we do search the Solr with respect to a specific field which is defined in schema.xml. But there are cases we need to search across multiple fields. Let us see how it can be achieved.

There are two ways to do this.

1. Using DisMax: Usually Solr comes with dismax plugin. So in query, we just need to pass all fields in qf field as shown below.

/select?defType=dismax&q="query1","query2","query3"&qf=field1 field2 field3

In above case we are searching 3 terms query1,query2,query3 (added in inverted commas to ensure words with space fetch matching results)
field1,2,3 are the fields in schema.xml to be searched.

2. Another way is collecting all data to same field by copying through schema.xml

We need to have below lines in schema files,

<field name="datacollection" type="text_general" indexed="true" stored="false" multiValued="true"/>

Then copy the contents of required fields to the new field

<copyField source="field1" dest="datacollection"/>
<copyField source="field2" dest="datacollection"/>
<copyField source="field3" dest="datacollection"/>

Then query in default field datacollection.