Conversation
|
I think the code part of this is ready to merge (just had to find where bed intervals were converted to regions). |
|
Hi @mrvollger , You could test this, but we really need Hao (@38) to have a look as I don't fully understand the datastructures used here. |
d4/src/ssio/reader.rs
Outdated
| } | ||
| } | ||
| } else { | ||
| break; |
There was a problem hiding this comment.
I think this introduce new bug. Basically this is the part that creates a view to secondary table partitions that contains the data points that is related to the query.
If break at this point, I believe this will introduce bug in some query for large interval which is larger than a secondary table partition.
So I am a little bit confused if the issue is just change the interval convention, what is the problem that needs to be fixed at this point.
To help me understand the change better, could you please give me some data that triggers a buggy output.
Thanks,
Hao
|
Hi @brentp, just let you know I've fixed issue #59 by this commit 000f9e6. And the expected output was all zero values. I explained roughly what is going on under issue #59. And I believe for this PR, you don't need to change the ssio/reader.rs implementation anymore. Just simply minus one should do the job. Please let me know if you have any questions. Thanks, |
ddda3c6 to
561ecd0
Compare
d4/src/ssio/reader.rs
Outdated
|
|
||
| if overlap_begin < overlap_end { | ||
| if overlap_begin == table_ref.begin || self.sfi.is_none() { | ||
| if overlap_begin + 1 == table_ref.begin || self.sfi.is_none() { |
There was a problem hiding this comment.
@38 do you think this +1 belongs here? I've forgotten all my previous testing, but I htink this might be required.
There was a problem hiding this comment.
I don't think we need +1 at this point. Basically what is happening at this point is, we check if we need to read a secondary partition from the middle of the stream. When overlap_begin equals stream's begin, it means we can read the partition from the begin, therefore no need to visit the index.
I suspect what you previously seen is unrelated to this part, but might be another bug.
There was a problem hiding this comment.
I've reverted all changes in ssio/reader.rs
d4 uses region syntax like: chr1:1-100, which is generally 1-based such that this region would translate to
chr1\t0\t100in BED format.The internals of d4 are unclear, but I think this change is a start.
@38 could you have a look at the changes in ssio/reader.rs, the break might not work if things are not sorted (though they appear to be). The change of the if statement at line 136/137 is related to the +/- 1 used internally within d4 (which I don't yet understand).
This change passes all tests and fixes the problems seen in #59