Welcome, Guest. Please login or register.

Author Topic: Brain haz can I? Need help inserting massive ammount of data into a table.  (Read 2235 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Don't the rows of your table have a primary key you can use for identification?

If you are inserting new rows, there are ways to optimize it into a smaller number of queries.

For example, most SQL implementations support a multiple insert:

INSERT INTO my_table (foo, bar) VALUES (5, "fish"), (33, "lobster"), (17, "football") ...

When you get your form data back you can construct a string akin to the one above and issue it as your query. You might want to limit it to submissions of no more than a few hundred inserts at once, depending on how much data there is.

If you are updating rows, that comes down to having to loop over each row from the form and update the corresponding one in the table. You need to use some unique column (or group of columns) to match against. If your table has a primary key, use that.

In a voice minutes system I wrote and optimized at work, I have to deal with selections of literally tens of thousands of rows from tables containing millions and turning it into various kinds of reports.

One of the operations involves assigning a price to each call record for a given client, which is tantamount to updating every row in a temporary table that has been determined to be an external call (worked out during the generation of that table) associated with it. So you are looking at about updating 1/3 to 1/2 of the entire dataset which can be over 100K rows. This was using MySQL as the backend.

I found the fastest solution by far, in this case, was to SELECT all the required rows, process them in the code to assign the price and write the result row to a temporary CSV file that has the same column structure as the temporary table.
Next I delete all the rows from the temporary table I just processed. This is an especially fast step since the query cache already knows which rows were affected by the WHERE clause from the previous SELECT statement used to grab the data.
Finally, I use a LOAD DATA INFILE statement to pull the temporary data back into the table. Obviously you need the FILE permission for this to work.

This was literally hundreds of times faster than updating each row one by one.
int p; // A