Experience with large datasets?

Does anyone have experience running any steps with large datasets, such as >10000 images?

I recently tried to process step 1 using 3D maps template with 12000 images. I received a low memory error and the computer crashed giving BSOD. The resources were maxed at full CPU (8 cores) and full ram (64gb).

I am going to try re-running it with the pix4D resources stepped down a bit.

PC specs:

i7-7700

GTX 1060 6gb

64gb ram

samsung 960 250gb ssd

 

I am wondering if this issue is a limitation with the program due to a large dataset, or an issue with my hardware. Any experience would be greatly appreciated.

 

Thank you

Hey Apex,

I have processed Thermal projects that had over 15000 images, although I have never created a 3D project before, I am guessing 3D projects may require more resources.

1 Like

Thank you Selim. Do you have a machine similar in specs to mine mentioned above? How do you have the resources in Pix4D set? I am trying to rule out if my BSOD error is due to the resources set to max for Pix4D, which does not leave any other resources for other operations on the pc, or whether this issue is a hardware issue.

I have a similar computer setup and then another that is an i9 with 128GB RAM and GTX1080Ti with 11GB VRAM.  I am running huge datasets and depending on the Pix4D version and settings, the lower-end computer can have issues.

The real question is how many pixels you are processing and under what settings in Pix4D.  In general I just run out of memory when doing a custom-high resolution mesh but even that can be lessened by using a swap file in Windows.  Take your MP of your pictures and multiply by the total number of pics…my largest dataset ran just fine at 132 billion total pixels and image scale 1X in both Step 1 and Step 2.

I am not sure 3D projects require more resources but that is the template I started with on these cell tower projects so if you are close to my dataset size then I can say that a more powerful computer will definitely run it just fine (probably take 5-7 days).

Hey Apex,

I have multiple machines that I ran over 15,000 images for thermal projects, which would take one to three days to finish with the default thermal template. And the most powerful computer that I used had similar specs to yours, only SSD was larger and I also had a GTX 1070 for the graphics card. But then again I am guessing a 3D map generation is probably more complex than a 2D agriculture map since the point cloud mesh would be more complex for a 3D map then a 2D map. Step 2, where all the point cloud is generated requires a lot of memory/ram. As a mentioned earlier though, I have never created a 3D map before, so I am not sure exactly what is going on with the 3D projects, however I have created close to 2000 agriculture specific projects, ranging anywhere from 50 images to 15000 images.  

1 Like

Selim, what is the size in MP of those images? Quantity of pics is no issue…it is the total pixel count.

Even though theoretically the software can handle an unlimited amount of images at the same time, from experience I would not recommend to run more than about 5000 images at 12MP together. Even for the latter you would need a monster of a machine, such as the one Adam described (128GB RAM,…). 

Note that for thermal or multispectral the amount of images can be higher, due to the lower resolution. For thermal I usually see 640x480 as resolution, and for multispectral images such as from the Sequoia camera it is 1.2MP. 

A major problem even if the processing finishes, is the enormous size of the output files, which make them unreadable in most other third party software. Have you found a workaround for that? 

I’d recommend to read this article about possible errors and solutions for large datasets: https://support.pix4d.com/hc/en-us/articles/202560909 

This article also describes a way to split and merge large projects: https://support.pix4d.com/hc/en-us/articles/202558579 

Pierangelo, your reference is only 60 billion pixels and while that is a lot, your software can easily handle double that with 128GB of RAM.  Don’t sell Pix4Dmapper Pro Desktop short!!!

But I will say that the cluster size gets larger than I like at 130 billion pixels so for right now I would target 100-120 billion pixels as a reasonable top end but not a limit.  That means you could run 100,000 thermal images from a Sequoia camera…wow…

1 Like

Hey Adam,

As Pierangelo mentioned above the Thermal and the Multi-spectral (Sequoia) are low resolution so the total pixel count is not very high, as for the RGB, we are using the the Sensefly SODA camera, which I think it is 20 MP’s and I have created projects using 2000 to 2500 images. And it takes about 7-8 hours to process usually. 

I’m currently processing  6000, 50MP images.  At 1/8th scale, step 1 took 16 hours.  We have over a hundred GCPs and checkpoints on the site as well.  At 1/4 scale, it has been processing for 20 hours now and still isn’t finished.  I had frequent lockups on similar projects, but I’ve backed off the resources (120/132GB ram, and down one core) and haven’t had a problem with BSOD or stopped working messages.   A machine is a core i7-7900, 128GB, 1080, Samsung 960 pro.    I wish the internal project splitter, would allow for a greater number of image overlaps.  I have had zero luck reconstituting a project split prior to step one and then merged.  

Doug, that is certainly a big data set at 300 GPixels as I haven’t ran anything over 200 GP yet.  It is a great tip to turn down the RAM slightly and 1 core to avoid the crashes.

Is that a Core i9 CPU?  I assume it is because of the 128GB of RAM.

What is the “internal project splitter”?  I would think with that many GCP’s that you would have zero issue merging of sub-projects.

I am also surprised you are turning down the Step 1 quality by using less than 1/2 to 1 scale with such a high-res camera.  My 3D modeling projects of cell towers are always ran with scale = 1.

Adam,

Core I7-6900K.  I’m not sure if having all that ram really makes a difference, but it is nice to run step 2 in a single block.  We are thinking of ways to run 100K+ image sets for some corridor work we are considering.  If you had a $30K budget, would you choose a few core i9 machines or twice as many 4Ghz i7’s?  Ram?  We’ll split the project into 2000-3000 image blocks and then reconstitute the point clouds in other software.

 

 

Paralleled processing approach will serve well. Will GPU acquisition to meet project time-lines be an issue? 

Doug, you may want to look into how Pix4D handles 1 cluster vs. multiple clusters for point cloud creation…it is very interesting.  And all of my future hardware purchases will be Core i9-7900X.  Again, the picture count doesn’t matter as it is the gigapixels that determine the hardware or sub-projects needed.  120-150 GP per project per computer is comfortable in 3-4 days on my setup that costs $6,000.  The only alternative is to use Pix4Dengine and it doesn’t quite fit my workflow so I am not testing it yet.

 

Gary, high-end GPUs can still be found in volume at regular prices if you know where to look :slight_smile:

Old thread, but still relevant.   

Ethos Geological recently processed a single contiguous project having 21,509 images (~377 GP).   A HEAP of work!  

Project and computer set up:

  • 43 GCP’s
  • 20 square miles
  • 21,509 images
  • 377 Gigapixels
  • 750 gb SSD
  • 64 core machine (i9 equivalent)
  • 250gb ram
  • 1 single Tesla T4 GPU (I think, maybe it was a T100, can’t recall now)

Timing:

  • 9 Days with low-res 3D point cloud on Step 2
  • 17 Days with ‘optimized’ 3D point cloud on Step 2

Salient take-aways:

  • Disable ALL Windows automatic update/restart/everything.  Then, remember you did this, as you’ll have to manually update or re-enable afterward
  • Create as many GCP’s as possible if breaking up project into multiple subprojects, 
  • Elevation data along the edges are always incorrect.   This isn’t Pix4d’s fault, but a consequence of an algorithm that has to interpolate data… hence the edges of subprojects never join nicely with each other without a dense row of MTP’s / GCP’s stitching the seams.  If it’s a big project, do your best to plan GPC’s along the seams from the beginning or else you’ll have to fake 'em
  • Water / snow surfaces are evil.   Especially waves / ocean.
  • You will rerun the model at least once, so plan on the timing above x2.   Why?  Because we’ve found are almost always tweaks to the model (typically adding more GCP’s- manufactured or actual), that improve the elevation.

Though not identical to the above, here is a snapshot of CPU activity from a project half this size, using low res 3D point cloud derived from ~11,000 images taking 50 hours to complete.

 

1 Like

Hi Scott,

thank you for sharing this with the community. It’s a very interesting case study and the hints you provide are very useful.

We did a webinar **Pushing the limits: Mapping large areas with drones **that cover a similar topic. I highly recommend it.

Best,

Is there a way to recover a project that crashed near the end of step 1? 14,000 photo project processing for 6 days just crashed very near the end of step one? Do I just have to break into smaller projects and start over?

Hi Aaron,
Unfortunately, if the project crashed, you cannot recover it. As you already mentioned, it is recommended to break it down and process again the smaller projects.
Keep in mind that we are developing a new product that will be focused on large datasets processing. You can subscribe to this page to get more news: Pix4Dmatic

@aaron we didn’t have this issue in 2018/2019- it was great, we did 21K images in a single push!- but now we DO have the issue of crashing on Step 1. As if pix4dmapper is now less capable of managing large datasets than it used to be. Unfortunately, I don’t have notes on the software/windows versions used prior to several upgrades that now plague our success so we can’t revert.

I recently posted an issue related to this (Large datasets crash on step 1)… it would be great if you have anything to add to that discussion?