We have 4x Cisco ASR1002 routers with 500+ bgp neighbors on them. Discovery of those devices is taking well over an hour and sometimes as high as 2 hours. Doing a debug the hanging happens on the bgp-peers module. It starts hanging after the array is created with all the neighbors it received from the ASR1002. After that it begins all the SQL select statements and those have a massive amount of device_ids included in the statement which makes them crawl.
I have attached a discovery of just a single ASR1002 having issues. I stopped the discovery in the section that slows to a crawl. You should be able to see a few of the sql statements with the massive list of device_ids. Each group of those takes 10-20 seconds to fully complete then it does the next one and the next one. Just this module takes up to an hour to complete on a single router. This issue is causing all our discovery jobs to backup and slow down the server. Nothing ends up completing and I'm forced to manually kill the jobs. Right now I am manually running jobs to discover new devices to avoid those 4 problem devices.
We have a few Cisco 7200s and ASR9ks with almost as many BGP sessions on them, but they complete full discoveries in under 5 minutes. Not sure why these ASR1002s are so much slower.
We have recently just upgraded to the newest code, mysql 10.4.16, and PHP 7.0.33.