Show HN: E-commerce data from 100k stores that is refreshed daily

searchagora.com

4 points by astronautmonkey 5 hours ago

Hi HN! I'm building Agora, an AI search engine for e-commerce that returns results in under 300ms. We've indexed 30M products from 100k stores and made them easy to purchase using AI agents.

After launching here on HN, a large enterprise reached out to pay for access to the raw data. We serviced the contract manually to learn the exact workflow and then decided to productize the "Data Connector" to help us scale to more customers.

The Data Connector enables developers to select any of our 100k stores in the index, view sample data, format the output, and export the up-to-date data. Data can be exported as CSV or JSON.

We've built crawlers for Shopify, WooCommerce, Squarespace, Wix, and custom built stores to index the store information, product data, stock, reviews, and more. The primary technical challenge is to recrawl the entire dataset every 24 hours. We do this with a series of servers that "recrawl" different store-types with rotating local proxies and then add changes to a queue to be updated in our search index. Our primary database is Mongo and our search runs on self-hosted Meilisearch on high RAM servers.

My vision is to index the world's e-commerce data. I believe this will create market efficiencies for customers, developers, and merchants.

I'd love your feedback!

amcunicorns 4 hours ago

Nice idea! Sounds like a lot of servers are needed to pull this off.

  • astronautmonkey 4 hours ago

    Thank you! And yes, the number of servers needed to scale from 100k to 1M stores (the next goal) will be significant.