Normally when searching for a new home, one picks a city & budget, and then a real estate agent would find some homes for you to look at.

But what if instead, you could make a computer program do the home searching for you?

What if you could stretch out your home search across space and time? Instead of focusing on one area your real estate agent knows about for a month or so, what if you could search everywhere for many months?

Using code, we can efficiently search across many areas, and search those areas for long periods of time to find the perfect home. Read on for how I did it.

(Skip right to my example code if you are interested)

Pseudocode

Where would we even begin?

Let’s write some pseudocode:

# First let's get every home we might be interested in,
# and save it to a Pandas DataFrame
df = get_all_homes_for_sale()
# Save the "kyle score" for each home
df["kyle_score"] = df.apply(compute_kyle_score, axis=1)
# Next let's get the top ten
top_homes = df.sort_values(by="kyle_score", ascending=False).head(10)
# And print the top ten
for home in top_homes:
    print(home)

Next we just have to write some actual code.

The three parts we need are:

  1. Getting every home for sale
  2. Applying a custom score to every home
  3. Sorting and printing our top homes

Getting Every Home For Sale

The HomeHarvest python library is the perfect library for doing this.

It scrapes Realtor.com and returns a Pandas DataFrame for easy data analytics.

Scraping is legal generally, just be easy on their API and sleep between requests.

HomeHarvest library in action in a notebook

I used this library, made the unit of scraping a zip code, then memoized it to prevent repeat API calls.

Applying a Custom Score

The key to having a good program for finding the perfect home is being able to actually encode your preferences quantitatively.

For me, this means a function that takes a home as an input, and returns a kyle_score.

In my reference code I take into account things like:

  • Walk/Bike/Transit Score
  • Scores for nearby amenities (LocalLogic / Yelp)
  • School Scores
  • Scores for how close the home is to a Dog Park
  • Scores for how close the home is to a Library
  • Scores for how close the home is to a Farmers Market
  • Bonus or negative scores for particular keywords
  • Functions for scoring the lot size
  • Functions for scoring the home size, beds, and baths
  • Functions for scoring the price of the home

The real key is weighting these scores to balance out everything. You don’t want a home to show up in the top 10 just because it has 10 bathrooms, but has fire damage (what a deal!).

A helpful hint is to have the scoring function return a tuple of (score, explanation), so that it can print a human-readable explanation for what the score is composed of:

epic home report example

Sorting and Printing

Then, all we have to do is sort every home for sale based on that metric. Your dream home is probably in the top 10!

Since these homes are in DataFrame format, this is as simple as:

top_ten = df.sort_values(by="kyle_score", ascending=False).head(10)

You can also write some code to run the open / xdg-open command to open up web browser tabs for each home of interest.

Just sleep a bit between tab opens so that Realtor.com doesn’t start giving you captchas.

epic home report listings

Sprinkle In Some AI

The ability to run local LLMs opens up the world of really interesting possibilities with a project like this.

You could use a local vision model to look at the pictures of the home and answer questions like:

  • Would this be a good home for chickens?
  • Is there a fence around the home?
  • Is the primary bedroom on the first floor?

Not that I would trust any of the answers to those questions…

The only AI I actually used was asking my local LLM to summarize each listing into a 3-5 word summary for reporting purposes.

Conclusion

I wouldn’t exactly call this data science, but this is a really fun way to find your perfect dream home.

The biggest gotcha is the data quality of MLS. This program will find lots of MLS mistakes.

But, if you can weed out the mistakes, perfect your personal scoring metric, and avoid Realtor.com rate limiting, then you too can write your own epic home search assistant!


Comment via email