[NNVM][TEST] Test against numerical grad#1505
[NNVM][TEST] Test against numerical grad#1505yzhliu merged 23 commits intoapache:masterfrom sgrechanik-h:numgrad-pr
Conversation
|
@kazum @kevinthesun @srkreddy1238 could you please review this PR? |
srkreddy1238
left a comment
There was a problem hiding this comment.
@sgrechanik-h
That was a great start to standardize the helpers across various test cases 👍
Few comments to allow non numpy forward/backward callers.
Otherwise this PR look great & waiting to get this merged so that all other test cases can use this instead of local helpers.
| None (default) means that all input variables will be used. May be a | ||
| list of pairs `(var_name, shape)`. | ||
|
|
||
| np_forward : Callable[..., List[numpy.ndarray]], optional |
There was a problem hiding this comment.
Suggest to give different naming for np_forward : Need not be numpy always, could be an implementation in any of the frontends we have.
Also receive and pass a flexible parameters 'attr' (dictionary object with default value None) for this function which can carry the implementation specific data like session info ...etc.
| logging.debug(debug_stage) | ||
|
|
||
| numpy_res = np_forward(**np_inputs_without_head_grads) | ||
| np.testing.assert_allclose(nnvm_res[0], numpy_res, atol=atol, rtol=rtol) |
There was a problem hiding this comment.
In the context of above comment on np_forward,
Pls check if the result from np_forward is a list and is yes, then compare all arrays in list.
|
|
||
| dump_graph : bool | ||
| Dump the graph even on success. | ||
| """ |
There was a problem hiding this comment.
Not sure an option to custom comparison function argument make any sense here.
Just raising as it came across my thought.
| only_targets : Set[str] | ||
| Test only for those targets from `ctx_list()` that are also in this set. | ||
|
|
||
| numerical_grads : bool or "if_possible" |
There was a problem hiding this comment.
How about putting "if_possible" case inside numerical_grads=True?
| then check gradients numerically only if this graph can be created (i.e. if there are some | ||
| operations with unimplemented gradients, it will just issue a warning). | ||
|
|
||
| delta : float |
There was a problem hiding this comment.
Add optional to optional arguments.
| shape = shape.copy() if shape else {} | ||
|
|
||
| grad_input_vars_real = [] | ||
| for x in grad_input_vars: |
There was a problem hiding this comment.
This logic looks complicated. How about just passing a list of symbol as grad_input_vars? Shape information can be defined in shape(maybe input_shapes of dict from str to tuple?) argument so that we don't have multiple shapes.
There was a problem hiding this comment.
Yes, not having shapes in grad_input_vars would be cleaner, I'll fix it.
| dtypes = graph.json_attr('dtype') | ||
|
|
||
| if shapes is None or dtypes is None: | ||
| graph = graph.apply('InferShape').apply('InferType') |
There was a problem hiding this comment.
Here we assume set_shape_inputs and set_dtype_inputs have been called. It might be better to directly pass input_shapes and input_types into this function to make it more general.
|
|
||
| forward_graph = nnvm.graph.create(symbol) | ||
|
|
||
| if dtype is not None: |
There was a problem hiding this comment.
We need to set_dtype_inputs. just check dtype is not None.
There was a problem hiding this comment.
Sorry, I think I didn't understand this one.
There was a problem hiding this comment.
If dtype is None, graph_to_function will raise exception? Same thing will happen for shape.
How about setting dtype as optional but not None as default value, such as dtype="float32"? For shape argument we should force user to pass in value. Then the code logic can be simpler.
There was a problem hiding this comment.
Setting dtype='float32' as default would simplify a lot of tests a little bit. However, if some variables have their dtypes specified by attributes then either these attributes should be preferred or an error should be raised, so I'm not sure.
There was a problem hiding this comment.
OK. Maybe we can put the shape/dtype check right in this function, before infer shape/dtype, and check either input variable shape/dtype attributes are set or shape/dtype arguments are passed.
There was a problem hiding this comment.
Since shapes and dtypes of some inputs may be inferred from shapes and dtypes of other inputs, the proper solution would be to run inference passes, and only then check if some inputs are left without types. Then these inputs can be assigned the default type.
| # nnvm_res contains the output and gradients (if they are needed) | ||
| nnvm_res = main_function(**np_inputs) | ||
|
|
||
| if np_forward is not None: |
There was a problem hiding this comment.
If np_forward is None, forward part comparison will be skipped? I think we should enforce np_forward to be set.
There was a problem hiding this comment.
Sometimes we might want check gradients only (or even only numerical gradients). But I think it is good idea to require passing numerical_grads=True explicitly if no reference function is specified.
| # Since the result may be non-scalar, we have to put another operation on the top, | ||
| # so we just multiple by the randomly generated head_grads. | ||
| # This way we can reuse the gradient values which has been already computed. | ||
| def function(**kwargs): |
There was a problem hiding this comment.
Change to a more meaningful name.
| ---------- | ||
| function | ||
| A function that takes inputs as keyword arguments (like `function(**input_values)`) and | ||
| returns a scalar result. Should accept and return numpy arrays. |
There was a problem hiding this comment.
Description is not quite clear here. Scalar or ndarray?
| Relative tolerance. | ||
| """ | ||
|
|
||
| if function_value is None: |
There was a problem hiding this comment.
We can just compute this value instead of setting it as an argument?
| inputs = [('x', dshape, x)] | ||
| helper(y, inputs, dtype, forward, backward) | ||
| inputs = [(x, dshape)] | ||
| check_function(y, inputs, forward, backward, dtype=dtype) |
There was a problem hiding this comment.
Here we are setting grad_input_vars=inputs, but inputs are actually input_shapes. It would be more clear if we can refactor the interface of check_function.
| def check_numerical_grads(function, grad_input_vars, input_values, grad_values, function_value=None, | ||
| delta=1e-3, max_error=1e+3, max_discarded_frac=0.1, | ||
| atol=1e-2, rtol=1e-2): | ||
| """A helper function that checks that numerical gradients of a function are equal to |
There was a problem hiding this comment.
Since this is new and will affect all backward operators, can you give more explanation on it?
If developers add new backward ops but find the checking failure, how can they quickly debug the issue? Since this is not simply comparing with np computation result, maybe we can have a tutorial for it. This would also be good to the stability of CI while we are introducing more and more backward ops.
| if grad.shape != input_values[x_name].shape: | ||
| raise AssertionError( | ||
| "Gradient wrt '{}' has unexpected shape {}, expected {} " | ||
| .format(x.attr('name'), grad.shape, input_values[x_name].shape)) |
There was a problem hiding this comment.
x is not defined here. x.attr('name') should be x_name.
|
|
||
|
|
||
| def check_numerical_grads(function, grad_input_vars, input_values, grad_values, function_value=None, | ||
| delta=1e-3, max_error=1e+3, max_discarded_frac=0.1, |
There was a problem hiding this comment.
The default value of delta is different from the check_function's one. Should be 1e-2?
There was a problem hiding this comment.
Oh, it should be 1e-3 everywhere, thank you!
| return (function(**modified_values) - function_value)/a_delta | ||
|
|
||
| for x_name in grad_input_vars: | ||
| grad = grad_values[x_name] |
There was a problem hiding this comment.
We can replace these two lines with for x_name, grad in grad_values.items():. Then grad_input_vars is no longer necessary.
| raise ValueError("numerical_grads must be a bool or 'if_possible', not {}" | ||
| .format(numerical_grads)) | ||
|
|
||
| input_vars = [x for x in symbol.list_input_variables()] |
There was a problem hiding this comment.
Can we simply replace this with input_vars = symbol.list_input_variables()?
Also added some tests for The main thing that is left to do is something like a tutorial on what to do if numerical gradient |
|
Simplified numerical grad checking, improved docs. Wasn't sure where to put the text on what to do when numgrad check fails, so I put it under Python API/NNVM API/nnvm.testing @kazum @kevinthesun @srkreddy1238 could you please review this again? |
|
|
||
| shape : Dict[nnvm.Symbol or str, Tuple[int]] or Tuple[int], optional | ||
| A dict mapping input variable names to shapes, or just a single shape. | ||
| By default shapes will be inferred automatically. |
There was a problem hiding this comment.
How can we infer input shapes? I think this argument shouldn't be optional.
There was a problem hiding this comment.
They are inferred from variables' attributes:
x = sym.Variable("x", shape=(1, 2))
There was a problem hiding this comment.
This interface doesn't have any constraint to input symbol. So I can write some codes like:
data = sym.Variable("data")
net = sym.conv2d(data, ...)
check_function(net, shape=None, ...)
It will raise exception later, since no shape information is provided? Currently this kind of input is marked as valid.
Also for kwargs, we can directly pass default values, instead of just None. Then we don't need to write a lot of "if xxx = None:" statements at the beginning stage.
There was a problem hiding this comment.
Oh, the error message is not nice in this case. I think making it nicer and clarifying the documentation should be enough.
|
|
||
| forward_graph = nnvm.graph.create(symbol) | ||
|
|
||
| if dtype is not None: |
There was a problem hiding this comment.
If dtype is None, graph_to_function will raise exception? Same thing will happen for shape.
How about setting dtype as optional but not None as default value, such as dtype="float32"? For shape argument we should force user to pass in value. Then the code logic can be simpler.
| if shape is not None: | ||
| nnvm.compiler.graph_attr.set_shape_inputs(backward_graph, shape) | ||
|
|
||
| backward_graph = backward_graph.apply('InferShape').apply('InferType') |
There was a problem hiding this comment.
https://github.com/dmlc/tvm/blob/master/nnvm/tests/python/compiler/test_top_level1.py#L32-L37
We just need to update the shape head_grads for input shape dictionary when compiling.
| Testing new operations | ||
| ---------------------- | ||
|
|
||
| When adding new operations, it it a good idea to test them. Testing |
| all_shapes = graph.json_attr('shape') | ||
| all_dtypes = graph.json_attr('dtype') | ||
|
|
||
| all_dtypes = [None if t == -1 else TCODE_TO_DTYPE[t] for t in all_dtypes] |
There was a problem hiding this comment.
TCODE_TO_DTYPE[-1] is defined as None in graph_attr.py. The if-else expression is not necessary.
| if None in dtype.values(): | ||
| raise ValueError("Input variables with no type: {}".format(dtype)) | ||
|
|
||
| if any([not s for s in shape.values()]): |
There was a problem hiding this comment.
not all(shape.values()) looks simpler.
| if (exclude_targets is not None and (target in exclude_targets or | ||
| str(target) in exclude_targets)) or\ | ||
| (only_targets is not None and not (target in only_targets or | ||
| str(target) in only_targets)): |
There was a problem hiding this comment.
The expression is a bit complicated. I’d suggest splitting them:
if exclude_targets is not None:
if target in exclude_targets or str(target) in exclude_targets:
logging.info(...)
continue
if only_targets is not None:
if target not in only_targets and str(target) not in only_targets:
logging.info(...)
continue
|
@srkreddy1238 @kevinthesun @kazum could you help to review again according to the new changes and explicitly approve if it is good to you. |
|
Thanks everyone. This is now merged. |
* [NNVM][TEST] Numerical gradient testing * [NNVM][TEST] Make some tests a little faster * Fix the failing test_top_level3 * Target exclusion for the check_function * Try to ignore singularities * grad_input_vars now can't contain shapes * Don't pass unnecessary grad_input_vars to check_function * Multiple outputs; fixes; testing of check_function * Use numerical_grads_params to pass parameters to numgrad checker * Fail when no action is requested excplicitly * Pass additional params to functions * Silence the linter issue * Simplified numgrad checking * Improved docs for check_function * Fixed the error message when no dtype is provided * Several fixes * Tests with shape/dtype inference for inputs * Don't check dense's grads on cuda * Raise an error if output dtypes haven't been inferred * Moved shape/dtype inference into a separate function; use float32 as fallback * Remove redundant dtype=float32 * Fix multiple outputs * Use check_function in the rest of the test_top_level1
* [NNVM][TEST] Numerical gradient testing * [NNVM][TEST] Make some tests a little faster * Fix the failing test_top_level3 * Target exclusion for the check_function * Try to ignore singularities * grad_input_vars now can't contain shapes * Don't pass unnecessary grad_input_vars to check_function * Multiple outputs; fixes; testing of check_function * Use numerical_grads_params to pass parameters to numgrad checker * Fail when no action is requested excplicitly * Pass additional params to functions * Silence the linter issue * Simplified numgrad checking * Improved docs for check_function * Fixed the error message when no dtype is provided * Several fixes * Tests with shape/dtype inference for inputs * Don't check dense's grads on cuda * Raise an error if output dtypes haven't been inferred * Moved shape/dtype inference into a separate function; use float32 as fallback * Remove redundant dtype=float32 * Fix multiple outputs * Use check_function in the rest of the test_top_level1
This PR moves the helper function checking nnvm operations against numpy reference implementations from
test_top_level#tonnvm.testingand adds the ability to compare symbolic gradients to numerically computed gradients. Also this PR fixes a wrong gradient implementation for the division operation (it was found using this approach).Currently only tests that used the old helper function are rewritten in such a way that they now use the new function. There are still many tests which didn't use the helper function, they should also be rewritten so that they use this function, if possible, but this is left for future work.