How to Upload/Import Very Large Files

So, you just got your hands on a lot of great, formatted, data, like a huge CSV of Business Listings that you now want to import into your Vantage site, or a whole bunch of ads to import into your ClassiPress site.

This type of data can be found in numerous ways, and its not the purpose of this tutorial on where to source such data sets, but instead how to work with the large sets of data that you now have and wish to import.

You are probably reading this not because you are trying to import a handful or even a few hundred rows long csv file, rather I am sure you have some 10,000, 25,000 or maybe even 50,000 to 100,000+ row csv file that you are running into issues getting it to fully import successfully.

First is the good news, WordPress is built to handle this large of data and so are AppThemes themes, as well as the AppThemes theme internal data importer. That’s the good news, so that you know your not running into any bottle necks there.

It’s important to look at this situation and know that there are two different things going on:

  • You are Uploading a large file
  • You are Importing a large file

In regards to Uploading, you may need to modify server configuration so the server is prepared and setup to accept high Mb file sizes as well as the timing it would take to upload a large file size.

In regards to Importing, you may have to address additional needs. It’s possible the server will need to be configured and setup for the time and memory needed to process the large file size. This is important as the server must process through a high quantity of rows of data, repeating the import processing tasks for each row of data in the .csv.

When you look at these tasks separately, it’s helpful to see where different settings and configuration directives apply and why.

Now, onto the disclaimers, it is not always a black and white process to get your very large file to import like a hot knife through butter. There are some key factors that can differ for each situation.

  • Server Configurability
  • Server Configuration
  • Server Limitations
  • File Size
  • Exact File Contents

I will elaborate on each of these, so you know which of these apply to your situation.

Server Configurability

I want to start here, because a lot of folks are using shared hosting, like BlueHost, HostGater, DreamHost, GoDaddy, etc. There is nothing inherently wrong with using shared hosting – especially with our promoted hosting partners – but it’s always important to know the limitations of your shared hosting solution. Most providers set limitations on what aspects of server configuration you are capable of modifying or overriding. This is mainly to keep the “shared” environment fair and so you don’t just update the settings in your files and end up hogging some or all of the physical server’s resources (RAM, CPU power, Bandwidth etc). The point is with shared hosting, resources are shared. It’s not always apparent which quantity or amount of resources you have allotted or allot-able to your specific piece of the shared server pie. The lesson here is that the shared hosting providers must set configuration limitations, so that no user can unfairly overuse server resources.

If you have a virtual server or dedicated server, you may or may not have it “managed” by your hosting provider or you may have actual root access. Either way, you should have access to the needed server configuration settings that will help you prepare for large file uploads. Note, if you do have “root access”, you should be familiar and experienced in accessing and making changes to your servers configuration under the root user. If in doubt about a command that you intend to run as the root user, always consult with your hosting provider or a professional systems administrator first.

Always check with your hosting provider directly to find out the amount of configurability they offer.

Server Configuration

Regardless of your hosting solution, let’s say you have the ability to make the needed changes to the server configuration settings. The next thing to check is maximum resources. Most of the time you should be fine, but this is just another area where different users have different server resource maximums due to their chosen plan or service level etc..

Server Limitations

There are also potential other areas regarding specific server hardware, or server software or operating system or operating system version, or specific database or database version, that differs from user to user, and it is beyond the scope of this tutorial to get into all of those potential limitations, differences, catches, nuances, or other shenanigans, etc.

Import File Size and File Contents

I don’t know the exact file size, the amount of rows, or the type of data you are attempting to import. My hope is that I can provide some helpful information. Anything that requires more than what is described here may need to be handled by a professional. Please make sure to check out our AppThemes Certified Partners for in case you need help getting your large dataset properly imported.

Also note, that AppThemes does not warranty the services of the certified partners, but warmly recommends them. Make sure to check with them directly on warranty of services to be rendered if you decide to utilize one of them.

Overview of General Areas of Concern

  • Server Configurability
    • Do I have shared hosting, a virtual server, or a dedicated server?
    • What level of server configurability does my hosting provider allow me?
    • Do I have the skills/experience to make the needed server changes?
  • Server Configuration
    • What are my current server configuration settings, and what are the maximum possible settings?
    • Do I have the overall resources I need and available for my needs?
  • Server Limitations
    • Do I have a server setup that brings unique limitations?
  • File Size
    • The larger the file size, the more the server needs to be ready and able to handle it.
  • Exact File Contents
    • Does my .csv file contain many, or excessive, columns in addition to high quantity of rows?
    • Does my .csv have rows that also trigger attachement downloads, image resizing, etc?
    • Does my .csv have rows that also trigger location geocoding or some other external API for additional data/processing?

Solutions and Tips

Server Configuration

Assuming that you have the most common and recommended server setup (WordPress Server/System Requirements):

  • PHP version 5.2.4 or greater
  • MySQL version 5.0 or greater
  • Apache / Nginx

Then this tutorial will have some helpful server configuration tips, please continue reading.

(Note: If you don’t know where this file is, or how to modify it, you should contact your hosting provider or systems administrator)

To help locate your php.ini file, follow these steps. Once you locate and dive into your php.ini file, you will see the below settings set to their default values. If you find that any of the below are not already set, set them at the bottom of the file.

One other important and helpful note when attempting to locate a php.ini file on your server, you will need to update the correct one that is being applied to the virtual host/document root where your site is being served from. In order to locate the applicable php.ini, add this:

phpinfo();
exit;

to the top of your index.php file in your wordpress install.

So the first few lines will look like this:

<?php
phpinfo(); 
exit;

Then reload the page and you will see a section for “Loaded Configuration File”, and there is your applicable php.ini file to use.

It will look like this:

Loaded Configuration File /path/to/the/file/php.ini

For the sake of simplicity and to ensure the highest integrity of these configuration directives, it is most recommended to place the below directives within the php.ini file on your server, even though some may indeed work in your .htaccess file.

Note, always make a backup of any file before making any changes to it. We strongly recommend that you make a backup of your php.ini file before making any of theĀ changes below.

Important Configuration Directives

These are the most applicable PHP configuration directives that you should consider modifying in your php.ini file, depending on your specific needs, to help get your server configured to handle your large file upload needs.

  • memory_limit
    • Your server memory limit may be set as low as 32M or something like that, so you could try a jump to 128M, and if that is still not enough and you are experiencing out of memory errors, you can try higher values such as 256M, or even 512M.
    • More info: http://php.net/manual/en/ini.core.php#ini.memory-limit
  • max_input_time
    • This is usually defaulted to 60, meaning 60 seconds. While you are making this large file upload, your import, you should change this to a very large number, and how large depends on many factors, but in order to make sure this is not the bottle neck you could set it for a few hours. Example for 5 hours: 60 * 60 * 5 = 18000.
    • You should change this back to the default of 60 once you large file upload needs are finished, it should not be left at a high number as referenced above, after this one time need.
    • More info: http://php.net/manual/en/info.configuration.php#ini.max-input-time
  • max_execution_time
    • As with max_input_time, this should be changed to a very large number for the duration of your temporary large file upload needs. Set to 18000
    • You should change this back to the default of 30 once you large file upload needs are finished, it should not be left at a high number as referenced above, after this one time need.
    • More info: http://php.net/manual/en/info.configuration.php#ini.max-execution-time
  • upload_max_filesize
  • post_max_size
    • Set this bigger than your upload file size, and it must also be bigger than the upload_max_filesize directive.
    • Remember to change this back to the default or another reasonable size that is appropriate for normal file uploads for your site.
    • More info: http://php.net/manual/en/ini.core.php#ini.post-max-size

Here is how these configuration values look within the php.ini file, make sure to keep the formatting correct:

Example:

; Maximum size of POST data that PHP will accept.
post_max_size = 16M

Format

; This is a comment, anything on a line after a ";" will be
; treated as a comment.
directive_name = directive_value

File Size and Contents

If updating the directives in your php.ini was not enough to make your large file uploads work, you could also consider manipulating the large file to be better able to work with it.

Splitting the file into smaller chunks

The .csv file you are trying to upload is a simple text file, and it is full of rows (lines). If you take half of the rows and put them into a new file, you now have two file uploads but each is half as large as your original. You could split it into 3 or 4 or even more, in order to get each file to a more manageable size.

Here is an example, of what a .csv file looks like, with rows and rows of data.

"title","description","author","date","slug","status","address","phone","facebook","twitter","website","listing_category","listing_tag","lat","lng","attachments","featured_attachment","listing_duration"
"AppThemes","AppThemes is a fast growing company that employs talent from all around the world. Our diverse team consists of highly skilled WordPress developers, designers, and enthusiasts who come together to make awesome premium themes available in over two dozen different languages.","admin","2012-04-27 00:48:56","appthemes","publish","548 Market St, San Francisco, CA 94104, USA","415-287-3474","appthemes","appthemes","appthemes.com","Software","themes,wordpress",37.789903,-122.400785,"http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png, //cdn.appthemes.com/wp-content/uploads/2012/04/banner-index-vantage.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/single-listing1.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/Vantage-search.jpg","http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png",30
"AppThemes-2","AppThemes is a fast growing company that employs talent from all around the world. Our diverse team consists of highly skilled WordPress developers, designers, and enthusiasts who come together to make awesome premium themes available in over two dozen different languages.","admin","2012-04-27 00:48:56","appthemes-2","publish","548 Market St, San Francisco, CA 94104, USA","415-287-3474","appthemes","appthemes","appthemes.com","Software","themes,wordpress",37.789903,-122.400785,"http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png, //cdn.appthemes.com/wp-content/uploads/2012/04/banner-index-vantage.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/single-listing1.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/Vantage-search.jpg","http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png",30
"AppThemes-3","AppThemes is a fast growing company that employs talent from all around the world. Our diverse team consists of highly skilled WordPress developers, designers, and enthusiasts who come together to make awesome premium themes available in over two dozen different languages.","admin","2012-04-27 00:48:56","appthemes-3","publish","548 Market St, San Francisco, CA 94104, USA","415-287-3474","appthemes","appthemes","appthemes.com","Software","themes,wordpress",37.789903,-122.400785,"http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png, //cdn.appthemes.com/wp-content/uploads/2012/04/banner-index-vantage.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/single-listing1.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/Vantage-search.jpg","http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png",30
"AppThemes-4","AppThemes is a fast growing company that employs talent from all around the world. Our diverse team consists of highly skilled WordPress developers, designers, and enthusiasts who come together to make awesome premium themes available in over two dozen different languages.","admin","2012-04-27 00:48:56","appthemes-4","publish","548 Market St, San Francisco, CA 94104, USA","415-287-3474","appthemes","appthemes","appthemes.com","Software","themes,wordpress",37.789903,-122.400785,"http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png, //cdn.appthemes.com/wp-content/uploads/2012/04/banner-index-vantage.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/single-listing1.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/Vantage-search.jpg","http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png",30

When splitting up your .csv file, remember these important notes:

  • You need to copy and paste the first line, into each additional file you create. They all need that first row, that is the column headers which are needed for the import process.
  • Make sure you don’t accidentally create any duplicates, or accidentally delete any rows in all of the shuffle.

Here is our simple example .csv from above, split into 2 different files. Notice the first row/line in both is that special row with the column headers.

File 1.csv

"title","description","author","date","slug","status","address","phone","facebook","twitter","website","listing_category","listing_tag","lat","lng","attachments","featured_attachment","listing_duration"
"AppThemes","AppThemes is a fast growing company that employs talent from all around the world. Our diverse team consists of highly skilled WordPress developers, designers, and enthusiasts who come together to make awesome premium themes available in over two dozen different languages.","admin","2012-04-27 00:48:56","appthemes","publish","548 Market St, San Francisco, CA 94104, USA","415-287-3474","appthemes","appthemes","appthemes.com","Software","themes,wordpress",37.789903,-122.400785,"http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png, //cdn.appthemes.com/wp-content/uploads/2012/04/banner-index-vantage.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/single-listing1.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/Vantage-search.jpg","http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png",30
"AppThemes-2","AppThemes is a fast growing company that employs talent from all around the world. Our diverse team consists of highly skilled WordPress developers, designers, and enthusiasts who come together to make awesome premium themes available in over two dozen different languages.","admin","2012-04-27 00:48:56","appthemes-2","publish","548 Market St, San Francisco, CA 94104, USA","415-287-3474","appthemes","appthemes","appthemes.com","Software","themes,wordpress",37.789903,-122.400785,"http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png, //cdn.appthemes.com/wp-content/uploads/2012/04/banner-index-vantage.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/single-listing1.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/Vantage-search.jpg","http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png",30

File 2.csv

"title","description","author","date","slug","status","address","phone","facebook","twitter","website","listing_category","listing_tag","lat","lng","attachments","featured_attachment","listing_duration"
"AppThemes-3","AppThemes is a fast growing company that employs talent from all around the world. Our diverse team consists of highly skilled WordPress developers, designers, and enthusiasts who come together to make awesome premium themes available in over two dozen different languages.","admin","2012-04-27 00:48:56","appthemes-3","publish","548 Market St, San Francisco, CA 94104, USA","415-287-3474","appthemes","appthemes","appthemes.com","Software","themes,wordpress",37.789903,-122.400785,"http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png, //cdn.appthemes.com/wp-content/uploads/2012/04/banner-index-vantage.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/single-listing1.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/Vantage-search.jpg","http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png",30
"AppThemes-4","AppThemes is a fast growing company that employs talent from all around the world. Our diverse team consists of highly skilled WordPress developers, designers, and enthusiasts who come together to make awesome premium themes available in over two dozen different languages.","admin","2012-04-27 00:48:56","appthemes-4","publish","548 Market St, San Francisco, CA 94104, USA","415-287-3474","appthemes","appthemes","appthemes.com","Software","themes,wordpress",37.789903,-122.400785,"http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png, //cdn.appthemes.com/wp-content/uploads/2012/04/banner-index-vantage.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/single-listing1.jpg, //cdn.appthemes.com/wp-content/uploads/2012/04/Vantage-search.jpg","http://www.appthemes.com/wp-content/themes/appthemes/images/logo.png",30

Extra import tasks

There are at least a few extra tasks that the importer could be trying to accomplish other than the creation of the listing (or ad, etc) during the import. It could be helpful to opt out from these additional tasks, so that you can have the processes and resources dedicated to the most minimal and effective at getting your very large file uploaded and imported.

For example:

  • Including the column and data in the rows for “attachments”, there is an additional step for the importer which is to fetch the remote image/file and possibly even image resizing.
  • Opting to geocode listings that don’t have the lat and lng columns set, which would give the importer the step to do an API call to the geocoding service to fetch the coordinates of the address in that row.

These are just a couple of examples that could add considerable extra processing. The extra processing grows depending on how many rows require that extra processing.

Considering these “extra” tasks could be what’s needed to handle a very large or problematic file. It’s just one more idea to try to help with the task of mass importing. Other means could then incrementally take care of the the tasks you potentially removed from the upload and import step.

Like this tutorial? Subscribe and get the latest tutorials delivered straight to your inbox or feed reader.

Your rating: none
Rating: 4.4 - 9 votes

Popular Add-ons

Dotpay SMS

Accept SMS micropayments in Polish Zloty currency with the Dotpay.pl SMS…


(2)
$39

Related Ads

Add related ads blocks on your ClassiPress single ad pages.


(8)
$10

Slider Ultimate

A unique featured ads slider for ClassiPress.


(16)
$19