libcurl C++: long delay in detecting the change of a HTTPS response code

  c++, curl, https, libcurl, web-scraping

Problem:

Using libcurl, the program below listens for a change in a web page’s HTTPS status code when it changes from 503 to 200. Recently however, and without making any changes to the code, there has been a substantial delay (around 18 seconds) between the sending of the final request (just when the page is being updated), and a response notifying that 200 is detected. So the page is updated with code 200 long before the program actually detects this and prints an output.

It’s almost as if the program "hangs" at the page’s update time and it cannot send or receive any requests, but only when the status code is being changed.

Notes:

  • Spikes in web traffic are not causing this delay. This is a well-known site that can handle high traffic
  • The owner of this site has said that no other users are experiencing these delays, and from their back-end confirmed that my IP address has recieved the data in a timely manner
  • The sending program has a typical latency of around 3 milliseconds from the server hosting the webpage
  • Using Ubuntu 20.04.1 LTS with libcurl installed using sudo apt-get install libcurl4-openssl-dev

Details:

This issue began to appear around mid-July 2021. Up until this point, the program used to listen for around 10 seconds on the run-up to a predetermined time of 18.45. At 18.45, the page would update its status code to 200 and the program would usually grab this new data in around 100 milliseconds.

As seen further below, this program uses a WHILE loop to send a request that returns the response code. On the run-up to page update, 503 is returned each time as expected, and when 200 is detected, a break is used to exit the WHILE loop and print the output.

To diagnose this, libcurl has been uninstalled and reinstalled, and separate servers with different IP addresses have been used (just incase the initial IP address was blocked from making requests for whatever reason.) The issue still persists.

Current program:

#include "omp.h"
#include <iomanip>
#include <vector>
#include <iostream>
#include <string>
#include <chrono>
#include <future>
#include <algorithm>
#include <cstring>

#include <curl/curl.h>

// Function for writing callback
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {

        std::vector<char> *response = reinterpret_cast<std::vector<char> *>(userdata);
        response->insert(response->end(), ptr, ptr+nmemb);
        return nmemb;
}

// Handle requests to URL
long request(CURL *curl, const std::string &url) {

        std::vector<char> response;
        long response_code;

        curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
        curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
        curl_easy_setopt(curl, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);

        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
        curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);

        curl_easy_setopt(curl, CURLOPT_COOKIEFILE, "");
        curl_easy_setopt(curl, CURLOPT_COOKIE, "");

        auto res = curl_easy_perform(curl);

        curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);

        if (response_code == 200) {
            std::cout << "200 FOUND" << std::endl;

            // print out payload
            // print out timestamp using <chrono>
        }

        return response_code;
}

int main() {

        curl_global_init(CURL_GLOBAL_ALL);
        CURL *curl = curl_easy_init();

        while (true) {
            long response_code = request(curl, "https://somewebsite.xyz");
            if (response_code == 200) {
                break; // Page updated
            }
        }
        curl_easy_cleanup(curl);
        curl_global_cleanup();
        return 0;
}

Summary questions:

Q1. Are there any mistake in the above code which would be causing this behaviour?

Q2. Have there been any updates to libcurl over the last few months that may have affected my program? As previously stated, the program worked perfectly up until mid-July 2021, and the code has not been changed whatsoever.

Q3. Could I have changed some sort of setting that would prevent the server from replying to my request quickly? Each request while the webpage’s code is 503 is returned very quickly, but as soon as the page is updated to 200 the request takes a long time.

Source: Windows Questions C++

LEAVE A COMMENT