Building a Personal AI Assistant App from Scratch

7 min readJan 3, 2025

Episode 1: motivation and retrospectives.

Why

AI is eating the world, is this a hype or becoming a reality?

Nvidia has become a trillion-dollar company. OpenAI, Anthropic, xAI, and many other AI companies are betting on the latest LLM technology to change the world. Ironically, I don’t feel that my life has been disrupted by AI that much, not even close to the iPhone moment.

So I spent sometime reflecting on that, there are a few reasons:

At work:

Management: most of the managerial work are inter-personal and there is no easy way for AI to help with my job.
Development: the enterprise AI (a.k.a Glean) doesn’t have access to most Google files (to avoid leakage), or proprietary source code. The code base is a huge mono repo, and it is very challenging for a non-fine-tuned model to perform anyway.

In daily life:

Personalization: most of the AI have no access to my personal data, making it really hard to provide personalized support to my daily life. A few examples: (1) I can’t ask ChatGPT/Perplexity to recommend a restaurant I would like without giving extensive details in my prompt. It might no have my timezone, location information anyway; (2) There is no way for ChatGPT to proactively perform actions for me.

Alright, I need a personalized assistant. Why not go with local LLMs? There are so many open source models to use, and Ollama is a fairly cool serving application to provide a local endpoint. Also, the convention wisdom nowadays is to keep your data local as much as possible for privacy, why would you wanna go with a cloud offering?

It is not realistic to build a 100% offline LLM assistant that is capable enough that I would use.

Challenges of a locally hosted model:

Small models don’t perform well. There are many small open source models you can technically deploy locally — like llama 3.2 1B/3B or llama 3.1 8B. However, from my exploration, those models are no way close to the capabilities like GPT 4o in terms of reasoning, function calling or other features. You only get a crappy version of the SOTA AI. Building AI application is hard, using a model that doesn’t have a good quality makes your life 100x worse.
Slow. Even with a small local model like 3B or 8B, it is hard to reach a sub-second latency for TTFT. Output token throughput is even worse, you would need a RTX 4090 to serve a 8B model fast enough.
Cross device sharing. Another issue with local LLM is that you can probably host a good 8B model in your Mac or desktop, but there is no way to jam a 8B model or even a 3B model in your phone without aggressive quantization. Even with quantization, the power consumption is huge, and the latency is not very acceptable. As a toy, yes, but not for daily consumption. The other issue, you will need some vector database to store the indexed information for RAG, however, if it is not in the cloud, there is no way for you to access that information across devices. In today’s world, most of the folks have a laptop/desktop, and a phone.

So I touted the idea to build a personalized AI assistant with a few friends and decided to take the Christmas break to build something from scratch!

It is here for download btw! It is free till it scales beyond what I can host in my home data center. Named after my dog (smart companion for real) :D

Current capabilities:

He understands you. Integration with Google Suite so it has access to your data in Google and the files will be indexed. When you ask Doodle, he will respond with the context/info from your Google accounts.
Search Internet. It can search the Internet when necessary and gives you sources.
Native Apple application with OS integrations. The app is built natively (for most pages), and you can invoke the app with a shortcut, and update your documents either in place or in the Doodle app.

The app is still heavily under development, but I personally feel it is useful enough as a personal alternative to Perplexity. Below are some learnings along the way!

Learnings

On-prem is actually more accessible than the cloud if you have some basic system admin skills

We often say cloud makes it possible to build many startups. However, I would argue the cloud is actually too expensive for someone to explore ideas. For my home lab, I spent a total amount of ~$2000 (most from discounted marketplace/eBay), but with:

RAID-Z2 storage of ~100TB, managed by TrueNAS. (8x18TB HDD with ~$150 each, one disk enclosure $200)
A cluster of 5 mini-PCs (N100 CPU for power saving, 16GB RAM, each ~$100).

That 100TB itself is $2000+/month, renting 5 my mini-PC equivalent EC2 cost about $400/month. Adding up, it is $2500/month, which is a pretty significant investment.

A home lab is a great way to bootstrap a service than purely using cloud. Of course it is no way close to the reliability (which has been a challenge when I lost power at home) or security SLA, but it is serious enough to support a small app. :)

OpenAI models are surprisingly cheaper and better than I’d expected

We used OpenAI pretty much at will during development for about 2 weeks, embedding model cost us almost nothing ($2–5 for indexing our files again and again), gpt-4o-mini, cost us about $10, there were some infinite loops in our code base that caused the app spending excessively long input the models (used 50M+ tokens for some reasons), but over all cost is <$10 / week.

As someone working on the AI platform field, we have been telling the story that open source model is better for your business and cheaper. That is ONLY true when you have an established team to support a deployment of Meta Llama models. This specialized team would cost millions of dollars per year to build and support.

You might also hear about other models like Anthropic, Gemini, or DeepSeek. Well, if there is one takeaway —

FIRST, AVOID GEMINI for your development.

It is extremely challenging to build apps using Google’s products. It is deeply fragmented (it has 3+ SDKs for its AI offering, genai, generative-ai, vertex). When you search online, you have no freaking clue which doc is for which. Or, just to know the usage of your API calls, try that — almost impossible to get that information. The documentation is also very confusing, and you have incompatible capabilities between gemini-1.5 vs gemini-2.0 (dynamic retrieval for grounded search). The result from the model is also pretty bad when you use GoogleSearch as a tool. The sources are not baked inside the response, but inside a “grounded_metadata” section, that the model itself can’t use to directly link in the model response. AVOID GEMINI.

DeepSeek v3 is good, but latency is pretty non-negligible, a simple request take 1s to come back, my guess is DeepSeek doesn’t have a datacenter in the US yet, so it has a huge latency.

I haven’t tested Anthropic extensively yet, but OpenAI is really good enough for you to start with till you get really advanced.

Co-pilot is SUPER helpful in coding general applications, use Claude-3.5

Co-copilot is definitely a game changer when you develop outside of your corporate work, for 50% of the simple tasks for Django applications, SwiftUI development, just one single command worked. A few iterations will make it work for another 30% of the cases. For the last 20%, you will have to figure it out yourself. Nonetheless, you will be able to learn from the copilot how things work in a stack/library you are less familiar with.

One takeaway is, use Claude 3.5, it is much better than gpt-4o from my personal experience.

Popular agent frameworks are overly complicated

There are two libraries in general for you to choose from, one is LangChain, and the other one is LlamaIndex. For some reason, I started with LlamaIndex and then realized its agent capabilities to be relatively immature, so I switched to LangChain for agent work (still use LlamaIndex for indexing). IT IS SUPER COMPLICATED and bears non-trivial performance overhead.

It is a bit hard for me to articulate why it is hard to use LangChain right now, but a simple task is abstracted with various APIs making it not trivial to customize or understand what is going on behind the scene. If you get stuck, just read its code. It is more effective than guessing and tweaking with other APIs.

The performance is also concerning, it is about 30% overhead on top of raw OpenAI calls for structured output, I think I will replace some of the langchain wrappers with my own implementation sometime in the future.

Having said all these, it is still high recommended to start with LangChain than building everything from scratch by yourself!

More to come…

**More to come…**