Seemingly random 502 Server Errors keep killing my quantum computations and a suggested workaround

From time to time I see the following error message when using the IBM backend, and would like to know whether you are also experiencing this and maybe have an idea what is the root cause:

While running a circuit that normally executes just fine, I get the following exception log
```
../../.local/lib/python3.5/site-packages/projectq/cengines/_main.py:304: in flush
    self.receive([Command(self, FlushGate(), ([WeakQubitRef(self, -1)],))])
../../.local/lib/python3.5/site-packages/projectq/cengines/_main.py:266: in receive
    self.send(command_list)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <projectq.cengines._main.MainEngine object at 0x7fbfb2ee2ac8>, command_list = [<projectq.ops._command.Command object at 0x7fbfb2ee2630>]

    def send(self, command_list):
        """
            Forward the list of commands to the next engine in the pipeline.
    
            It also shortens exception stack traces if self.verbose is False.
            """
        try:
            self.next_engine.receive(command_list)
        except:
            if self.verbose:
                raise
            else:
                exc_type, exc_value, exc_traceback = sys.exc_info()
                # try:
                last_line = traceback.format_exc().splitlines()
                compact_exception = exc_type(str(exc_value) +
                                             '\n raised in:\n' +
                                             repr(last_line[-3]) +
                                             "\n" + repr(last_line[-2]))
                compact_exception.__cause__ = None
>               raise compact_exception  # use verbose=True for more info
E               Exception: Failed to run the circuit. Aborting.
E                raised in:
E               '  File "/home/cgogolin/.local/lib/python3.5/site-packages/projectq/backends/_ibm/_ibm.py", line 295, in _run'
E               '    raise Exception("Failed to run the circuit. Aborting.")'

../../.local/lib/python3.5/site-packages/projectq/cengines/_main.py:288: Exception
```
and on the console I then see:
```
- There was an error running your code:
502 Server Error: Bad Gateway for url: https://quantumexperience.ng.bluemix.net/api/users/login
```
The frequency of this error seems to be independent of the type of circuits I run and I get this from time to time, independently of the type of internet connection I use, so that I can exclude simple connection problems on my end.

Running with verbose=true reveals that the source of the error is in `_run(self)` around line 260 in `_ibm.py`, namely:
```
>           counts = res['data']['counts']
E           TypeError: 'NoneType' object is not subscriptable
```
i.e., `res = send(...)` did return `None` instead of actual results.

An strait forward workaround for me is to simply run `send(...)` until it returns a non-`None` result, e.g,,  as follows: 
```
if self._retrieve_execution is None:
        res = None
        retries = 10
        while(res is None and retries > 0):
            retries -= 1
            res = send(info, device=self.device,
                       user=self._user, password=self._password,
                       shots=self._num_runs, verbose=self._verbose)
```
In practices I virtually never need more than a second attempt to get a result. This makes me believe that the problem is also not related to me sending too many queries or other rate limiting mechanisms.

I have found other people having [similar spurious 502 errors on blumix](https://developer.ibm.com/answers/questions/203063/recurrent-intermittent-502-errors-bad-gateway-on-b-1/). Their application is (probably) not at all quantum related, so maybe we are just suffering from some classical middleware misconfiguration?

Could/Should ProjectQ handle such errors more gracefully?

Digging a little deeper, I see that `send()` calls `_get_result()` and this already has a retry mechanism built in. The only way I can see in wich `_get_result()` and then `send()` can return `None` without raising an Exception is if the json of the return value of `requests.get()` contains the element r_json['qasms'][0]['result'] and this element is `None`. I can thus also fix the problem by adding the `and qasm['result'] is not None`  in the last but line of the following code in `_get_result()`:
```
    for retries in range(num_retries):
        r = requests.get(urljoin(_api_url, suffix),
                         params={"access_token": access_token})
        r.raise_for_status()

        r_json = r.json()
        if 'qasms' in r_json:
            qasm = r_json['qasms'][0]
            if 'result' in qasm and qasm['result'] is not None:
                return qasm['result']
```
On a related note: Aren't the default values `num_retries=3000` and `interval=1` of `_get_result()`, which result in a total waiting time until the timeout of nearly one hour a bit long? Wouldn't it be nice to make those user customizable?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seemingly random 502 Server Errors keep killing my quantum computations and a suggested workaround #291

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Seemingly random 502 Server Errors keep killing my quantum computations and a suggested workaround #291

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions