ARROW-6315: [Java] Make change to ensure flatbuffer reads are aligned#5229
ARROW-6315: [Java] Make change to ensure flatbuffer reads are aligned#5229tianchen92 wants to merge 4 commits intoapache:ARROW-6313-flatbuffer-alignmentfrom
Conversation
|
@tianchen92 it looks like this breaks the ArrowTools test |
It's really strange, I believe it works fine with maven build on my local machine, is there any difference with travis? |
|
That is interesting, your change passes on my box as well. Let me try rerunning CI. |
|
@ursabot build |
BryanCutler
left a comment
There was a problem hiding this comment.
Thanks for the pr @tianchen92 . I'll have to look at it more closely again, but I think when you deserialize you check for the continuation that is not there because it has already been read.
@BryanCutler Thanks a lot for your feedback, I left some comments above and updated this PR.
|
|
Looks like I can reproduce the error locally, I'll look into it. |
Awesome, thanks a lot for your help! |
|
@BryanCutler do you have a sense if as part of this change the old/new writing/reading code paths in Java need to be controllable by an environment variable for use with Spark? I think we need something like that for Python? |
|
I just found the root cause of the error and can post a patch soon.
Maybe, I'm not entirely sure. The case that I think might fail is when a user has pyarrow 0.15.0 and using Spark <= 2.4.4 with Arrow Java 0.14.1. It should work going Java -> Python, but then going Python -> Java, it will write the 8-byte prefix by default and Java will choke. So an env var would help in that case I think. |
…after writing This change prevents a race condition where the `EchoServer` closes the client socket immediately after writing batches and the client may have not finished reading from the socket. The socket was being closed in three places: `ArrowStreamReader.close()` and `ArrowStreamWriter.close()` when being AutoClosed, and when the `ClientConnection` went out of scope. The race condition was prevalent in #5229 and consitently failing the `EchoServerTests.basicTest`. Closes #5288 from BryanCutler/java-EchoServer-prevent-close-6461 and squashes the following commits: 3763cbc <Bryan Cutler> Prevent EchoServer from closing the client socket after writing batches Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
|
@tianchen92 #5288 is merged, please rebase when you can and hopefully this will be good to go |
There was a problem hiding this comment.
Hmm, this doesn't seem quite right. The padded bytes are included in the messageLength, then that length is read using ReadChannel.readFully so reader.bytesRead() should include the padded bytes as well
There was a problem hiding this comment.
Oh wait, is this the additional 4 bytes sent at the end of stream? What happens if the socket isn't closed and then another stream or other data is sent over it? I think that extra 4 bytes will still need to be read and cause problems for the next stream.
@wesm @emkornfield , does C++ ignore the last 4-bytes at the EOS, since now it is an 8-byte value?
There was a problem hiding this comment.
Good catch, I looked again and I think you are right, the additional 4 bytes should be read.
In this case, may be we should also pass option to reader and read the additional 4 bytes if necessary?
There was a problem hiding this comment.
It looks like the C++ implementation only reads the first 4 bytes:
How else are we supposed to know if we should read 4 or 8 bytes? Maybe for the new format, an 8-byte EOS token should look like {0xFFFFFFFF, 0x00000000}, so we read the continutation token first, and then know to read the next 4 bytes, which are then 0 to signal EOS.
There was a problem hiding this comment.
Could the reader just remember the state, so if it reads the continuation
token from the beginning, then read all 8 bytes at the end?
5953b99 to
13b7481
Compare
Thanks Bryan, I have rebased and let's see if it could pass :) Ah, I made a mistake, this branch is not based on master... |
…after writing This change prevents a race condition where the `EchoServer` closes the client socket immediately after writing batches and the client may have not finished reading from the socket. The socket was being closed in three places: `ArrowStreamReader.close()` and `ArrowStreamWriter.close()` when being AutoClosed, and when the `ClientConnection` went out of scope. The race condition was prevalent in apache#5229 and consitently failing the `EchoServerTests.basicTest`. Closes apache#5288 from BryanCutler/java-EchoServer-prevent-close-6461 and squashes the following commits: 3763cbc <Bryan Cutler> Prevent EchoServer from closing the client socket after writing batches Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
13b7481 to
1eb71d2
Compare
|
Looks like the previous test failure is passing now. Integration tests have lots of failures with JS and Go, but Java <-> C++ look fine. Are there known issues with JS/Go integration, or maybe they don't have the protocol changes yet? Also, I think from the discussion here #5229 (comment) this is going to be a problem. We could fix this as a followup, since it seems C++ is doing the same thing? What do you guys think? |
|
JS hasn't been checked in. Go has been checked so we might need to investigate if that is failing. |
|
It seems like all the errors on the integration test are due to JS. |
|
I'm going to merge, lets follow-up on the terminal 8 bytes on the mailing list. |
Implements the IPC message format alignment changes for [ARROW-6315](https://issues.apache.org/jira/browse/ARROW-6315). i. MessageReader can read messages with the old alignment ii. ArrowWriter could choose produces messages with the new alignment or the old format. Closes #5229 from tianchen92/ARROW-align-java and squashes the following commits: 1eb71d2 <Bryan Cutler> ARROW-6461: Prevent EchoServer from closing the client socket after writing cd4fd05 <tianchen> fix small bugs 9a690e4 <tianchen> fix comments and styles 5ee858c <tianchen> Make change to ensure flatbuffer reads are aligned Lead-authored-by: tianchen <niki.lj@alibaba-inc.com> Co-authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
|
NVM, I missed that the changes were going into an integration branch. |
|
Closing to avoid confusinon. |
Implements the IPC message format alignment changes for [ARROW-6315](https://issues.apache.org/jira/browse/ARROW-6315). i. MessageReader can read messages with the old alignment ii. ArrowWriter could choose produces messages with the new alignment or the old format. Closes #5229 from tianchen92/ARROW-align-java and squashes the following commits: 1eb71d2 <Bryan Cutler> ARROW-6461: Prevent EchoServer from closing the client socket after writing cd4fd05 <tianchen> fix small bugs 9a690e4 <tianchen> fix comments and styles 5ee858c <tianchen> Make change to ensure flatbuffer reads are aligned Lead-authored-by: tianchen <niki.lj@alibaba-inc.com> Co-authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Implements the IPC message format alignment changes for [ARROW-6315](https://issues.apache.org/jira/browse/ARROW-6315). i. MessageReader can read messages with the old alignment ii. ArrowWriter could choose produces messages with the new alignment or the old format. Closes #5229 from tianchen92/ARROW-align-java and squashes the following commits: 1eb71d2 <Bryan Cutler> ARROW-6461: Prevent EchoServer from closing the client socket after writing cd4fd05 <tianchen> fix small bugs 9a690e4 <tianchen> fix comments and styles 5ee858c <tianchen> Make change to ensure flatbuffer reads are aligned Lead-authored-by: tianchen <niki.lj@alibaba-inc.com> Co-authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Implements the IPC message format alignment changes for [ARROW-6315](https://issues.apache.org/jira/browse/ARROW-6315). i. MessageReader can read messages with the old alignment ii. ArrowWriter could choose produces messages with the new alignment or the old format. Closes apache#5229 from tianchen92/ARROW-align-java and squashes the following commits: 1eb71d2 <Bryan Cutler> ARROW-6461: Prevent EchoServer from closing the client socket after writing cd4fd05 <tianchen> fix small bugs 9a690e4 <tianchen> fix comments and styles 5ee858c <tianchen> Make change to ensure flatbuffer reads are aligned Lead-authored-by: tianchen <niki.lj@alibaba-inc.com> Co-authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
…after writing This change prevents a race condition where the `EchoServer` closes the client socket immediately after writing batches and the client may have not finished reading from the socket. The socket was being closed in three places: `ArrowStreamReader.close()` and `ArrowStreamWriter.close()` when being AutoClosed, and when the `ClientConnection` went out of scope. The race condition was prevalent in apache#5229 and consitently failing the `EchoServerTests.basicTest`. Closes apache#5288 from BryanCutler/java-EchoServer-prevent-close-6461 and squashes the following commits: 3763cbc <Bryan Cutler> Prevent EchoServer from closing the client socket after writing batches Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Implements the IPC message format alignment changes for [ARROW-6315](https://issues.apache.org/jira/browse/ARROW-6315). i. MessageReader can read messages with the old alignment ii. ArrowWriter could choose produces messages with the new alignment or the old format. Closes apache#5229 from tianchen92/ARROW-align-java and squashes the following commits: 1eb71d2 <Bryan Cutler> ARROW-6461: Prevent EchoServer from closing the client socket after writing cd4fd05 <tianchen> fix small bugs 9a690e4 <tianchen> fix comments and styles 5ee858c <tianchen> Make change to ensure flatbuffer reads are aligned Lead-authored-by: tianchen <niki.lj@alibaba-inc.com> Co-authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Implements the IPC message format alignment changes for ARROW-6315.
i. MessageReader can read messages with the old alignment
ii. ArrowWriter could choose produces messages with the new alignment or the old format.