Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Sitecore Search Source Types – Part I

    Sitecore Search Source Types – Part I

    April 7, 2025

    Sitecore Search is a robust search solution designed to streamline the indexing and retrieval of content with ease. Supporting a wide range of source types, it empowers developers to integrate various content repositories without breaking a sweat. In this blog, we’ll take a deep dive into the different Sitecore Search source types, complete with implementation examples, to help you hit the ground running—and maybe even have a little fun along the way! Because let’s face it, even search solutions can be exciting when you know what you’re doing. Ready? Let’s search for success!

    Sitecore Search supports multiple content sources, including web crawlers, API-based sources, Sitecore Content (XM/XP), database sources, and file-based sources.

    Web Crawler & Web Crawler (Advanced)

    Sitecore Search web crawlers are used to index external websites such as marketing pages, blogs, or help documentation. They can extract content, metadata, titles, and links to unify search across sources. The crawlers support pagination, respect robots.txt, and can follow links, including PDFs. They work with public-facing sites or gated content depending on authentication support. The basic crawler is best for static HTML, while the advanced crawler adds support for dynamic content, API-based sources.

    The basic web crawler is suitable for crawling simple blogs or marketing pages, extracting standard elements like title, body, and metadata, and handling basic pagination. It can also use sitemaps, or simple URL filters and supports basic authentication for gated content. However, for more complex scenarios, advanced crawler is required. It supports authenticated content using tokens or custom headers, can extract and process PDF links, and handles DOM-based or multi-template extraction. The advanced crawler also works well for indexing multilingual websites, crawling structured content like tables or schema.org metadata, and accessing dynamic or JavaScript-heavy sites by targeting API endpoints.

    API Crawler

    An organization has product data stored in a headless CMS or a custom e-commerce platform. Each product is available through a RESTful API endpoint using a query like:

    query { 
        products {
           id
           name
           description
           price
          image {
            url
            altText
           }
       }

     

    This query retrieves structured product data along with media information (image URL and alt text), which can be mapped to Sitecore Search index fields for display in search results or personalized experiences.

    The goal is to make this content searchable in Sitecore Search with structured metadata (name, description, price, categories, images).

    The API crawler is ideal when data isn’t available as public HTML pages or when there’s a need for full control over indexing. It works by sending GET requests to the API, parsing the JSON response, and mapping the data to Sitecore Search index fields. It supports pagination, token-based authentication, and custom headers, making it perfect for secure or complex integrations. You can filter, transform, or enrich data before indexing, which is especially useful for frequently updated sources like product catalogs or content managed in headless CMS platforms.

    What to Keep in Mind

    When implementing Sitecore Search, it’s crucial to consider factors like content freshness (no one likes outdated results), indexing frequency (because a once-a-year refresh isn’t cutting it), and data structure (keep it clean or risk a search disaster). If you’re working with JavaScript-heavy websites, be prepared—web crawlers might get overwhelmed, so some extra configuration might be required. For API-based sources, make sure you handle rate limits and authentication properly, or you’ll be stuck waiting for permission to proceed. And when indexing Sitecore CMS content, remember to factor in versioning and workflow states—after all, only the published content should make it to the index. With a little attention to detail, your search results will be top-notch, and everyone will think you’re a Sitecore Search wizard!

    Sitecore Search provides a range of flexible source types to meet all your indexing needs, ensuring businesses can deliver a seamless and efficient search experience. Whether it’s website content, structured data, or document-based information, Sitecore Search has the tools to make everything searchable and accessible—like a super-powered search engine, but without the superhero cape (though we’re sure it’d look good).In my next blog, we’ll explore more Sitecore Search source types and their unique use cases. It’s going to be a journey, and no, you won’t need a compass—just a good internet connection and maybe a cup of coffee! Stay tuned for more! For a comprehensive overview of Sitecore Search, including crawlers, extractors, and widgets, feel free to refer to my earlier blog post: Making Sense of Sitecore Search: Crawlers, Extractors, and Widgets.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow tech giants like Netflix built resilient systems with chaos engineering
    Next Article Avoiding Metadata Contention in Unity Catalog

    Related Posts

    Development

    GPT-5 is Coming: Revolutionizing Software Testing

    July 22, 2025
    Development

    Win the Accessibility Game: Combining AI with Human Judgment

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-54313 – EsLint-Config-Prettier Malicious Code Injection

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-7060 – Monitorr Remote File Inclusion Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Stop buying Steam games that are already on Game Pass — I found an extension that does the work for you

    News & Updates

    CVE-2025-27242 – OpenHarmony Denial of Service Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-32819 – SonicWall SMA SSLVPN File Deletion Vulnerability

    May 7, 2025

    CVE ID : CVE-2025-32819

    Published : May 7, 2025, 6:15 p.m. | 1 hour, 20 minutes ago

    Description : A vulnerability in SMA100 allows a remote authenticated attacker with SSLVPN user privileges to bypass the path traversal checks and delete an arbitrary file potentially resulting in a reboot to factory default settings.

    Severity: 8.8 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-48701 – OpenDCIM SQL Injection Vulnerability

    May 23, 2025

    Linux Schools – Ubuntu-based server based distribution

    May 1, 2025

    Google rolls out 3 new Cloud Marketplace perks and incentives to keep you loyal

    May 15, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.