Replies: 2 comments 6 replies
-
|
In the usual configuration there are two threads doing COPYs, one for the "middle" tables (in slim mode only), one for the output tables. Data is collected in chunks and then send via a queue to those threads for the actual COPY operation. We could use a thread pool instead of those two threads for the actual COPY but never thought that this would improve the situation much. In the end the bottle neck is probably the I/O isn't it? And doing more of this in parallel means more contention on the WAL and, if we are writing to the same table in multiple COPYs at once, more contention an that table. So it is unclear to me why having more parallelismus would help significantly. Doing anything with multithreading in C++ code is always a pain, so keeping this code as simple as possible is also important. But maybe we are wrong there and didn't take some issue into account. And if somebody wanted to try this, that would be great, we'd gat actual data. |
Beta Was this translation helpful? Give feedback.
-
|
@tvondra When COPYing into the same table from multiple threads, do you see a possible issue with data ordering? What I mean is that in one case (the middle tables in slim mode), those tables will be written in the order of their primary key id. I assume this to be a good thing, at least building the index will be faster I would assume. If we write from multile threads, the table will not be as ordered. Do you forsee any issues there? (I would expect this to be a large issue, with changes afterwards the table will get unsorted anyway, but I'd just though I check.) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I've been doing some testing with OSM data, and I noticed that a significant part of the load is taken by COPY, which happens without parallelism. Are there any plans to parallelize this, either by loading multiple tables concurrently, or splitting the data into smaller chunks and loading them through multiple connections?
I'm not very familiar with the data structure, so maybe there are dependencies that make this impossible / inefficient. But it's a bit sad to not be able to better utilize available hardware resources.
Beta Was this translation helpful? Give feedback.
All reactions