Processing Large JSON Files in PHP Without Running Out of Memory

Posted by cyfervoid 1 month ago

Processing Large JSON Files in PHP Without Running Out of Memory

Working with JSON files is common in modern PHP applications. Whether you're importing product catalogs, syncing organization records, processing API exports, or migrating data between systems, JSON is often the preferred format.

For small files, most developers use a straightforward approach:

$json = file_get_contents('data.json');
$data = json_decode($json, true);

While this works perfectly for small datasets, it can become a serious problem once files grow to hundreds of megabytes or even several gigabytes.

We've encountered this issue in production environments where importing large datasets caused memory exhaustion, slow execution times, and application crashes.

Here's what we've learned about handling large JSON files efficiently.

The Problem with file_get_contents()

The biggest issue is that PHP loads the entire file into memory.

Let's assume:

JSON file size: 500 MB
PHP memory limit: 256 MB

The import will fail before it even starts processing the data.

Even if your memory limit is increased, decoding a large JSON structure often consumes significantly more memory than the original file size.

A 500 MB JSON file can easily require over 1 GB of memory after decoding.

Common errors include:

Allowed memory size exhausted

Fatal error: Out of memory

Why Increasing Memory Isn't the Best Solution

A common reaction is to increase the memory limit:

ini_set('memory_limit', '2048M');

While this may temporarily solve the issue, it doesn't scale.

As data grows:

Memory usage continues increasing
Imports become slower
Server stability decreases
Multiple concurrent imports become risky

The goal should be reducing memory usage, not continuously increasing limits.

Process Data in Chunks

If the source allows it, divide data into smaller files.

Instead of:

organizations.json

consider:

organizations_1.json
organizations_2.json
organizations_3.json

Processing smaller datasets offers several benefits:

Lower memory consumption
Faster recovery when failures occur
Easier monitoring
Better queue management

In Laravel projects, chunked processing often integrates well with queues.

Stream Data Instead of Loading Everything

Streaming is one of the most effective solutions.

Rather than loading the entire file into memory, records are processed as they are read.

This approach dramatically reduces memory consumption.

A streaming parser only keeps a small portion of the file in memory at any given time.

Benefits include:

Constant memory usage
Better scalability
Ability to process gigabyte-sized files
Improved server stability

For large imports, streaming should usually be the preferred approach.

Move Heavy Processing to Queues

Large imports should rarely run during a web request.

Imagine a user uploads a large file and waits for processing to finish.

Problems may include:

Request timeouts
Server resource spikes
Poor user experience

Instead, queue the work:

ProcessOrganizationImport::dispatch($filePath);

The user receives immediate feedback while the queue worker handles processing in the background.

This approach is more reliable and scales significantly better.

Batch Database Operations

Another common performance issue occurs during database inserts.

Many developers start with:

foreach ($records as $record) {
    Organization::create($record);
}

This generates a database query for every record.

When importing hundreds of thousands of records, performance suffers dramatically.

A better approach is batch insertion:

DB::table('organizations')->insert($batch);

Benefits include:

Fewer database queries
Faster imports
Reduced database load
Better scalability

Monitor Memory Usage

When optimizing imports, it's useful to monitor memory usage during execution.

PHP provides simple functions:

echo memory_get_usage(true);

and

echo memory_get_peak_usage(true);

These metrics help identify bottlenecks and confirm whether optimizations are working.

Use Database Indexes Carefully

Large imports often involve duplicate detection.

For example:

Organization::where('symphony_id', $id)->first();

Without proper indexing, lookup performance degrades rapidly as the table grows.

Important columns used for searching should generally be indexed.

Examples:

symphony_id
email
external_id
kvk_number
slug

Proper indexing can reduce lookup times from seconds to milliseconds.

Consider Incremental Imports

Many systems repeatedly import the same data.

Instead of reprocessing everything:

Import 1,000,000 records daily

consider:

Import only changed records

Benefits include:

Reduced processing time
Lower server load
Faster synchronization
Better overall performance

Incremental imports become increasingly valuable as datasets grow.

Production Checklist for Large JSON Imports

Before processing large JSON files, we typically verify:

Streaming or chunked processing is used
Queue workers are running
Database indexes are present
Batch inserts are implemented
Memory usage is monitored
Failed imports can resume safely
Logs capture processing statistics

Following these practices helps prevent unexpected failures during production imports.

Final Thoughts

Large JSON imports often work perfectly during development but become problematic in production once datasets increase in size.

Rather than relying on higher memory limits, focus on techniques that scale:

Stream data instead of loading everything
Process records in chunks
Use queues for background processing
Batch database operations
Monitor memory consumption
Optimize database indexes

These approaches not only prevent memory exhaustion but also create import systems that remain reliable as data volumes continue to grow.

If your application regularly handles large datasets, investing time in proper import architecture today can save countless hours of troubleshooting later.

Processing Large JSON Files in PHP Without Running Out of Memory

The Problem with file_get_contents()

Why Increasing Memory Isn't the Best Solution

Process Data in Chunks

Stream Data Instead of Loading Everything

Move Heavy Processing to Queues

Batch Database Operations

Monitor Memory Usage

Use Database Indexes Carefully

Consider Incremental Imports

Production Checklist for Large JSON Imports

Final Thoughts

Services

Apps and Tools

Latest Posts

Cool Stuff

The Problem with file_get_contents()

Why Increasing Memory Isn't the Best Solution

Process Data in Chunks

Stream Data Instead of Loading Everything

Move Heavy Processing to Queues

Batch Database Operations

Monitor Memory Usage

Use Database Indexes Carefully

Consider Incremental Imports

Production Checklist for Large JSON Imports

Final Thoughts

How We Automatically Recover Laravel Queue Workers Using Supervisor and Bash

What We Learned Building HubSpot Modules for Real Businesses

Python for Absolute Beginners: A Step-by-Step Guide

Services

Apps and Tools

Latest Posts

Cool Stuff