If you want to index a lot of entities then it is not a good idea to use solr:index:populate
.
The command works well with 100k-200k entites everything larger than that makes the command incredible slow and needs a lot of memory. This is because doctrine is not designed to handle hundred-thousands of entities.
In my following example I have a person
table with 5000000 rows and three columns: id
, name
and email
. The resulting documents are schemaless, so all fields have a suffix e.g. name_s
.
Here are some possibilities which works well for me:
This solution does not use PHP.
- export your data to person.csv
SET @TS = DATE_FORMAT(NOW(),'_%Y_%m_%d_%H_%i_%s');
SET @FOLDER = '/tmp/'; -- target dir
SET @PREFIX = 'person';
SET @EXT = '.csv';
-- first select defines the header of the csv-file
SET @CMD = CONCAT("SELECT 'id', 'name_s', 'email_s' UNION ALL SELECT * FROM person INTO OUTFILE '",@FOLDER,@PREFIX,@TS,@EXT,
"' FIELDS ENCLOSED BY '\"' TERMINATED BY ',' ESCAPED BY '\"'",
" LINES TERMINATED BY '\r\n';");
PREPARE statement FROM @CMD;
EXECUTE statement;
Then run this SQL-script:
mysql -udbuser -p123 dbname < dump_person_table.sql
The resulting file looks like this: /tmp/person_2017_03_01_11_21_41.csv
- index the csv with post-tool
/opt/solr/solr-5.5.2/bin/post -c core0 /tmp/person_2017_03_01_11_21_41.csv
PDO Select + Solarium BufferedAdd
The script has two parts:
- select a chunk of rows from the DB
- add the rows to the index with Solarium
<?php
require 'vendor/autoload.php';
$connection = new PDO('mysql:host=localhost;dbname=dbname;charset=utf8mb4', 'dbuser', '123');
$connection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$statement = $connection->prepare('SELECT COUNT(*) as total_items FROM person');
$statement->execute();
$countResult = $statement->fetch(PDO::FETCH_ASSOC);
$totalItems = $countResult['total_items'];
$batchSize = 5000;
$pages = ceil($totalItems / $batchSize);
$client = new Solarium\Client([
'endpoint' => [
'localhost' => [
'host' => 'localhost',
'port' => 8983,
'path' => '/solr/core0',
]
]
]);
/** @var \Solarium\Plugin\BufferedAdd\BufferedAdd $buffer */
$buffer = $client->getPlugin('bufferedadd');
$buffer->setBufferSize($batchSize);
for ($i = 0; $i <= $pages; $i++) {
$limitStart = ($i - 1) * $batchSize;
$limitEnd = $batchSize * $i;
if ($i == 0) {
$limitStart = 1;
$limitEnd = $batchSize;
}
$statement = $connection->prepare(sprintf('SELECT id, name, email FROM person WHERE id >= %s AND id <= %s ', $limitStart, $limitEnd));
$statement->execute();
foreach ($statement->fetchAll(PDO::FETCH_ASSOC) as $item) {
$buffer->createDocument([
'id' => $item['id'],
'name_s' => $item['name'],
'email_s' => $item['email']
]);
}
$statement->closeCursor();
$buffer->commit();
echo sprintf('Indexing page %s / %s', $i, $pages) . PHP_EOL;
}
$buffer->flush();
This solution exports the database to csv by using PDO. The exported files are located under /tmp/export
.
$connection = new PDO('mysql:host=localhost;dbname=dbname;charset=utf8mb4', 'dbuser', '123');
$connection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$statement = $connection->prepare('SELECT COUNT(*) as total_items FROM person');
$statement->execute();
$countResult = $statement->fetch(PDO::FETCH_ASSOC);
$statement->closeCursor();
$totalItems = $countResult['total_items'];
$batchSize = 10000;
$pages = ceil($totalItems / $batchSize);
@mkdir('/tmp/export');
for ($i = 0; $i <= $pages; $i++) {
$data = [];
$limitStart = ($i - 1) * $batchSize;
$limitEnd = $batchSize * $i;
if ($i == 0) {
$limitStart = 1;
$limitEnd = $batchSize;
}
$statement = $connection->prepare(sprintf('SELECT id, name, email FROM person WHERE id >= %s AND id <= %s ', $limitStart, $limitEnd));
$statement->execute();
$data[] = "id, name_s, email_s\n";
foreach ($statement->fetchAll(PDO::FETCH_ASSOC) as $item) {
$data[] = sprintf("\"%s\", \"%s\", \"%s\"", $item['id'], $item['name'], $item['email']);
}
$statement->closeCursor();
file_put_contents(sprintf('/tmp/export/person_%s.csv', $i), join("\n", $data));
echo sprintf('Indexing page %s / %s', $i, $pages) . PHP_EOL;
}
To import the data we are using Solr Post-Tool:
/opt/solr/solr-5.5.2/bin/post -c core0 /tmp/export