Skip to Content

The Real Estate Data Problem: Aggregating Property Records Across 35+ Counties

Every Texas county stores property records differently. Here's what it takes to pull them all into one searchable system.

The Real Estate Data Problem: Aggregating Property Records Across 35+ Counties

If you work in Texas real estate, you already know the drill. You need property tax data for a parcel in Harris County, so you go to HCAD's website. Then you need a record from Travis County, so you visit TCAD. Bexar County? BCAD. Dallas? DCAD. Each one has a different website. Different search interface. Different data format. Different way of organizing the same basic information.

Multiply that across 35+ counties, and you've got a full-time data problem on your hands.

We've spent years building property data aggregation systems for Texas counties. We've written scrapers for over 25 county appraisal districts, built graph databases for land ownership relationships, and designed portals that pull judicial records and tax delinquency data into a single searchable system. This post is about what that work actually looks like behind the scenes, and why it's harder than most people expect.

Every County Is Its Own Universe

Here's the part that surprises people who haven't done this work: there's no standard. Texas has 254 counties, and each county appraisal district runs its own website with its own technology stack, its own data schema, and its own quirks.

Some counties run old ASP.NET WebForms applications from the mid-2000s. Others have modern React frontends. Some require you to solve CAPTCHAs before returning results. A few still use HTML table layouts that look like they were built in FrontPage.

The data itself varies too. One county might list the property owner as "SMITH, JOHN A" while another uses "John A. Smith" and a third stores it as "SMITH JOHN." Legal descriptions follow different conventions. Assessed values appear in different fields with different labels. Some counties provide tax payment history going back decades. Others give you last year and that's it.

This isn't a minor inconvenience. For firms working across multiple counties, pulling a single property report can mean visiting four or five different websites, searching with slightly different parameters on each one, and then manually reconciling the results into something coherent.

The Manual Lookup Problem

A property research firm we worked with had staff spending 15-20 minutes per property just gathering basic tax and ownership data. When you're processing hundreds of properties per month, that's hundreds of hours burned on data collection that adds zero analytical value.

And that's the best case. When county websites go down (which happens more than you'd think), or when they change their layout without warning (which also happens regularly), that 15-minute lookup turns into 30 minutes of troubleshooting. Or it just doesn't happen that day.

Dealing with scattered county data? Let's talk about building a unified property data system.

What a County Scraper Actually Does

Building a scraper for a single county tax website sounds straightforward. Hit the URL, parse the HTML, extract the data. In practice, each county is its own engineering project.

Take Harris County (HCAD) as an example. It's one of the largest appraisal districts in the state, and its website has gone through multiple redesigns over the years. The scraper needs to handle their specific search flow, wait for JavaScript-rendered content, parse their particular table structure, and extract property details, ownership records, assessed values, and tax history into a normalized format.

Now take a smaller county. Maybe their site is a simple PHP application with server-rendered HTML. The scraper for that county looks completely different. Different request patterns, different HTML structure, different field names, different pagination.

Each of our 25+ county scrapers is essentially a custom integration. They share a common framework (we use Python with Scrapy and ScrapyRT for the HTTP API layer), but the actual extraction logic is unique to each county.

The Technical Stack

Our county scraping infrastructure runs on Python, Scrapy, ScrapyRT, and Docker. The architecture looks like this:

  • Individual spider per county - each one handles that county's specific website structure and quirks
  • ScrapyRT REST API - wraps the scrapers in an HTTP API so other systems can request property data on demand
  • Docker containers - each county scraper runs in its own container for isolation and independent deployment
  • Data normalization layer - takes the raw scraped data from different formats and maps it into a single consistent schema

This design means we can update a single county's scraper without touching any others. When Tarrant County redesigns their website (and they will), we update one spider and redeploy one container. The rest of the system doesn't know or care.

Why Scrapers Break (And How to Handle It)

County websites change. Sometimes it's a full redesign. Sometimes it's a small tweak to the HTML structure. Sometimes they add a CAPTCHA. Sometimes their SSL certificate expires and the site just stops responding for a week.

We've seen all of these. The key is building monitoring into the system so you know when a scraper breaks before your users notice. Automated health checks run against every county scraper on a schedule. When one starts returning errors or missing data fields, it triggers an alert and gets fixed.

This is the maintenance burden that most people underestimate. Building the initial scrapers is a project. Keeping 25+ scrapers running against websites that change without warning? That's an ongoing operation.

Normalizing the Data

Getting the data out of county websites is half the problem. The other half is making it useful.

When you pull property records from 35+ different sources, you need to normalize everything into a common format. Owner names, property addresses, legal descriptions, assessed values, tax amounts, payment status. All of it needs to follow consistent rules so you can search, filter, sort, and compare across counties.

This normalization work is where a lot of real estate data projects fail. It's tedious. It requires deep knowledge of how each county formats its data. And edge cases are everywhere.

For example: how do you handle a property that spans two counties? What about trusts and LLCs where the owner name varies by filing? How do you reconcile a property that was reassessed mid-year and shows two different values? These aren't hypotheticals. They're the kinds of issues that come up weekly when you're aggregating data at this scale.

Learn more about our custom software development approach.

Why Property Data Is Graph-Shaped

One of the more interesting technical decisions we've made in this space is using a graph database (Neo4j) for property data management.

Think about what property data actually looks like. A parcel has an owner. That owner might own other parcels. Parcels have liens. Liens have holders. Properties transfer between entities. Those entities have relationships to other entities.

In a traditional relational database, modeling these relationships requires complex JOIN queries across many tables. In a graph database, the relationships are first-class citizens. Querying "show me all properties owned by entities connected to this LLC" goes from a nightmare multi-table JOIN to a straightforward graph traversal.

We built a GraphQL API on top of Neo4j (using Node.js and Apollo Server) with a React frontend that includes interactive Leaflet maps. Users can search for a property and immediately see the ownership graph, related parcels, transfer history, and geographic context on a map.

This kind of relational analysis is what makes aggregated property data actually valuable. It's not just about looking up one parcel's tax bill. It's about seeing patterns across ownership structures, identifying delinquent properties connected to common entities, and understanding the full picture of a real estate portfolio.

Want to see how a unified data platform could work for your portfolio? Book a discovery session.

Real-World Applications

The property data systems we've built are used in a few different ways.

Property Tax Law

Legal professionals working property tax cases need data from multiple counties to build their case files. Our systems pull that data automatically, generate petition documents, and integrate with case management software (like Clio). What used to take a paralegal an hour per case now takes a few clicks.

Investment Research

Real estate investment firms evaluating properties across multiple Texas markets use aggregated data to identify opportunities. Tax delinquency trends, assessed value changes, ownership patterns. Having all of this in one searchable system, with map overlays showing property boundaries, changes the speed of research from days to hours.

Tax Delinquency Tracking

Tracking tax delinquent properties across counties is nearly impossible manually. Our portal systems pull delinquency data alongside judicial records, letting users identify properties in distress and understand their full legal and financial context.

Portfolio Management

For firms managing properties across many counties, having a single portal that shows current tax status, assessed values, and ownership records for every property in the portfolio eliminates the daily multi-county website juggling act.

What a Unified Property Data System Looks Like

If you're building (or thinking about building) a property data aggregation system, here's what we've found works.

The Data Collection Layer

Individual scrapers per county, each containerized and independently deployable. A REST API layer that lets other applications request data. Health monitoring and alerting when scrapers break. Scheduled refresh cycles to keep data current.

The Normalization Layer

Rules for mapping county-specific formats to a common schema. Owner name standardization. Address parsing and geocoding. Deduplication logic for properties that appear in multiple datasets.

The Storage Layer

Relational databases for structured property and tax data. Graph databases for ownership and relationship mapping. Full-text search for flexible querying across all fields.

The Application Layer

Search interfaces that work across all counties simultaneously. Interactive maps with property boundary overlays (we use Leaflet.js with OpenStreetMap, plus GeoJSON for parcel geometries). Ownership graph visualization. Export capabilities for reports and analysis. Integration points with case management, CRM, or other business systems.

See our work with real estate companies.

The Build-vs-Buy Question

Off-the-shelf property data platforms exist. Companies like CoreLogic, ATTOM, and BatchData offer API access to aggregated property data. We've integrated with some of these (we use BatchData's API in one of our portal systems).

These services are great for basic lookups. But they have limitations:

  • Data freshness varies. Some sources update monthly, others quarterly. If you need current tax payment status, you might need direct county access.
  • Coverage gaps. Not every county and every data field is available through third-party APIs.
  • Cost at scale. API pricing gets expensive fast when you're querying thousands of properties regularly.
  • Limited customization. You get the data they collect, in the format they provide.

For many use cases, a hybrid approach works best. Use third-party APIs for broad coverage and supplement with direct county scrapers for specific markets or data fields where you need fresher or more detailed information than the APIs provide.

Getting Started

If your team is spending hours pulling property records from county websites, or if you're relying on stale data because manual collection can't keep up, there's a better way.

We've built these systems from the ground up, starting with the messy reality of county data and working backward to clean, searchable, normalized information. We know which counties are easy to scrape and which ones are painful. We know where the edge cases hide. And we know how to build systems that keep running when county websites inevitably change.

The first step is understanding your specific data needs, county coverage requirements, and how the data fits into your business workflows. Our discovery process maps all of this out before we write a line of code.

Schedule a discovery session to talk about your property data challenges. We'll walk through what a unified system could look like for your specific situation.

Or reach out if you've got questions about property data aggregation, county tax scrapers, or real estate data systems.

Your property research shouldn't mean visiting 35 different websites. Let's fix that.

Why Production Tracking Spreadsheets Are Killing Your Manufacturing Growth