Describe the bug
How did you handle this issue if you run many spark jobs which is running pydeequ checks at the same time?
in my case only one job is running rest of the other jobs are failing with this issue when the other jobs are running in the emr cluster.
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 2207, in start
OSError: [Errno 98] Address already in use
During handling of the above exception, another exception occurred: Any solutions or suggestions
Traceback (most recent call last):
File "/opt/ammar/pydeequ_poc_pyspark.py", line 26, in
_check = _check_func(*_args)
File "/usr/local/lib/python3.7/site-packages/pydeequ/checks.py", line 134, in hasSize
assertion_func = ScalaFunction1(self._spark_session.sparkContext._gateway, assertion)
File "/usr/local/lib/python3.7/site-packages/pydeequ/scala_utils.py", line 32, in init
super().init(gateway)
File "/usr/local/lib/python3.7/site-packages/pydeequ/scala_utils.py", line 16, in init
self.gateway.start_callback_server()
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1894, in start_callback_server
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 2216, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to start the callback server (127.0.0.1:25334)
21/11/19 13:44:06 INFO SparkContext: Invoking stop() from shutdown hookcise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Run multiple pydeequ checks at the same time in an emr cluster
Describe the bug
How did you handle this issue if you run many spark jobs which is running pydeequ checks at the same time?
in my case only one job is running rest of the other jobs are failing with this issue when the other jobs are running in the emr cluster.
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 2207, in start
OSError: [Errno 98] Address already in use
During handling of the above exception, another exception occurred: Any solutions or suggestions
Traceback (most recent call last):
File "/opt/ammar/pydeequ_poc_pyspark.py", line 26, in
_check = _check_func(*_args)
File "/usr/local/lib/python3.7/site-packages/pydeequ/checks.py", line 134, in hasSize
assertion_func = ScalaFunction1(self._spark_session.sparkContext._gateway, assertion)
File "/usr/local/lib/python3.7/site-packages/pydeequ/scala_utils.py", line 32, in init
super().init(gateway)
File "/usr/local/lib/python3.7/site-packages/pydeequ/scala_utils.py", line 16, in init
self.gateway.start_callback_server()
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1894, in start_callback_server
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 2216, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to start the callback server (127.0.0.1:25334)
21/11/19 13:44:06 INFO SparkContext: Invoking stop() from shutdown hookcise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Run multiple pydeequ checks at the same time in an emr cluster