How I Hosted My Search Engine with My NAS

 

Creating a Personalized Search Engine Using Network Attached Storage

Today, when ad platforms like Google own vast amounts of user data and everything is tailored to the needs of advertisers and website owners, running your search engine brings several advantages in an age where privacy and personalization become more important. This allows you to hone your search functionality as per your requirement, on top of having more control over your data. Back home recently I started a project, to host my search engine with a Network Attached Storage (NAS) device. It was a painful but fulfilling exploration of personal search engines and in this post, I will show you step by step how to get there and what you can resume from it.

Most commonly employed for sharing files within a home or office network, a NAS is more versatile than that. Aside from the typical hosting of websites and virtual machines, you can even host your very own search engine (as I did). Seems hard to believe that you could host a search engine, but with some cool tools and config setup at your end; it’s possible. This means I can now have my very own, full-blown private search engine, which indexes all my personal and other (*GPL-assisted) content that I think is of importance.

Step 1: Choosing the Right NAS for the Job

The first step Get The NAS DeviceYou need the right NAS device first to host your search engine. Basic NAS models will work for basic file storage and backup but they might not have the CPU needed to process search engine scores. I worked with a Synology NAS, and they have a good reputation for being feature-rich and customization.

One way or the other you have got to make sure that your NAS must have a powerful powerful processor, Intel CPU is preferable, and more than enough RAM for better performance. I chose a model with an Intel quad-core processor having 8GB of RAM which was fine enough for the task. You should also factor in storage capacity the search engine itself is pretty lightweight, but all the indexed data won’t be. If you index a big end data then the storage should be huge in that case.

You also need to have a NAS that supports installation for Docker containers. It will let you run applications in nice isolated containers so it is less cumbersome on the NAS and easy to manage your search engine without disturbing the other functionality on the NAS.

Step 2: Setting Up Docker on the NAS

After getting the proper NAS in place, I set up Docker next. Docker is an open-source project to easily create lightweight, portable, self-sufficient containers from any application. It is a Web applications hosting tool where we can run multiple services simultaneously without interfering with each other.

It is pretty easy to install docker on a NAS from Synology. You will also find it right in the Synology Package Center and setting up the installation should only take a few clicks. After I had Docker up and running, I started to look around for some solutions that could work well in a containerized environment as a search engine. I researched a little and I finally settled with Searx, an open-source metasearch engine.

Another good choice for a privacy-focused search tool is Searx, which collects results from all over various search engines without recording your searches or storing personal information. It also provides a personalized UX, and advanced filters for search results. Even more importantly, Searx is pretty efficient and works great on a small machine (like the storage servers used by most techies), which makes it perfect for home-hosted instances as well.

Step 3: Installing and Configuring Searx

Once I had Docker set up, the next step was to get Searx going. Searx can be installed in multiple ways, but Docker made it relatively simple. In my case, all I needed to do was pull this Searx Docker image from Docker Hub and then configure it to run within the NAS environment.

Now, I have to create a Docker container for Searx after pulling the image. This included the network, ports, and environment variables. There are some important configurations to be made for contacts to work and thankfully they have been. documented well like defining base URL, card dir café, etc.

Of these what amounted to be one of the most important was configuring the search engines. Searx supports user-aggregate-able search engines. I figured that I should keep the privacy of others, and that way or another opted out from being spied on by unchecking engines (except DuckDuckGo, Startpage, Qwant) that collect any data. All Searx also offers some advanced filtering options that will help you to show only what you prefer.

And, finally, tweaked the user interface to replicate what I prefer. Searx is highly customizable concerning its UI: You can alter the layout, theme, and even behavior of search requests. This is one of the primary motivations for me to self-host a search engine I can have a search engine that is completely designed specifically for myself.

Step 4: Indexing Personal Files and Websites

After Searx was going, I needed to be able to search for stuff. One of the primary motivations for hosting my search engine was to be able to index my files privately and have a personal, searchable file store running on the NAS. To this effect, I needed to configure Searx to index my local data as well so that it could offer me web results in addition to requests being served off the disk.

I have accomplished this using Apache Nutch- a free web crawler. Nutch is an crawling all-in-one crawl and index-building solution for websites and local files, written in Java. I then configured Nutch to crawl certain directories on my NAS and updated the index frequently. This way I always had the most recent versions of my files in my search engine.

Configuring the crawler required me to set my file path and define what type of files I wanted indexed (PDFs, Word Docs, Images) I included my favorite websites in the crawl list, which means I can look for content without hopping onto each of them separately.

Step 5: Securing the Search Engine

Security is always an issue when hosting any service online, even more so if that service will be receiving personal data. I just needed to make sure my global search was pretty sound and no bad people could poke at it. Some of Tehama’s security protocols to help me get there included….

The first thing I did was set up HTTPS to encrypt the connection between my browser and the search engine. Let’s Encrypt provides free SSL certificates for use on the web, easy to set up and configure on the NAS. That degree of encryption means that whatever data is passing between my devices and the search engine, is encrypted.

The other thing I did was to only allow trusted devices on my home network to have access to the search engine. I do not require external access, so I enabled the NAT firewall setting on my NAS to stop all other incoming traffic filtering to separate itself from everything outside my subnet. I also turned on two-factor authentication for my NAS login so no one can take over the actual device

Final Thoughts: The Benefits of a Self-Hosted Search Engine

Some of the benefits that I have from hosting my own search engine on my NAS might seem ridiculous, but they combine to make it a little bit more worth the effort. First and foremost, I am now in total control of my job search. No more third-party tracking or irrelevant ads clogging up my results. I also have a searchable index across my own personal files, which means finding documents and data stored on my NAS is much easier.

Although it all may seem quite complicated at a glance, creating your self-hosted search engine is hardly an impossible task when you have the right software and some time to experiment. So no matter if you are a privacy concern, a geek, and also want to have a customized search experience hosting your search engine can be a very rewarding project. It now become a powerful, full-featured device that stores my data and gives me a search that is customized to how I need it.