- Notifications
You must be signed in to change notification settings - Fork 36
actually remember individuals #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Use the PedigreeID() method to access the pedigree id, rather than accessing pedigree_id_ directly; that's a private ivar. Your build didn't fail because you're building SLiMgui, which has special access to ivars all over the place to make life simpler. If you build the SLiM target instead, your build will fail when you use private ivars like this. Nice that Travis-CI automatically checks pull requests, I didn't know it would do that! :-> |
| I'm not sure why the reset to -1 for the individual column would be unneeded now. Simplify may not throw away individual table entries any more if the corresponding nodes are gone; but the same is not true for SLiM, right? There can be nodes in the tree sequence for which (a) SLiM has forgotten the individual because it is dead and gone, and (b) it was not a "remembered individual", so there's no saved individual table entry for it, right? So won't those nodes still need to be given -1? Re: the "which I think is OK but should be noted in the documentation" comment, I don't see that that is an issue at all; like, what would it even mean for it to be otherwise? Remembering an individual is like setting a flag, right? Once the flag is set, setting it again makes no difference; flags can't be "double-set". So in my opinion you can just delete that "NOTE:" comment entirely, I think it's obvious. No other comments at the moment; I haven't reviewed it in fine detail yet, but it looks good to me. Thanks! |
core/slim_sim.cpp Outdated
| // TODO: CHECK THIS IS UNNEEDED?!? | ||
| // reset the individual column to -1 (which is MSP_NULL_INDIVIDUAL) | ||
| memset(p_tables->nodes->individual, 0xff, p_tables->nodes->num_rows * sizeof(individual_id_t)); | ||
| // memset(p_tables->nodes->individual, 0xff, p_tables->nodes->num_rows * sizeof(individual_id_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why the reset to -1 for the individual column would be unneeded now. Simplify may not throw away individual table entries any more if the corresponding nodes are gone; but the same is not true for SLiM, right? There can be nodes in the tree sequence for which (a) SLiM has forgotten the individual because it is dead and gone, and (b) it was not a "remembered individual", so there's no saved individual table entry for it, right? So won't those nodes still need to be given -1?
Ah, this is important. We don't want to recompute tables.nodes.individualeach time around. But, correct me if I am wrong, but I think this is OK. Simplify does throw away individuals with no remaining nodes; but these individual's nodes are guaranteed to be around still. The only individuals kept in tables.individuals are Remembered ones -- others get added to the copy of the tables that get output, but that procedure doesn't touch the underlying tables that get carried forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect I'm missing the point somehow. But p_tables is the output table, and contains all of the nodes in the current tree sequence. The job of this method is to set up individual-related information in p_tables, not just for the remembered and extant individuals, but also for all of the nodes in p_tables. Some of the nodes in p_tables will correspond to individuals that were not remembered and are no longer extant, and so the p_tables->nodes->individual entries for those nodes need to have -1 set for their individual. So the strategy up until now, as I understand it, has been to set p_tables->nodes->individual to all -1, and then to fill in the right values for those nodes that still correspond to an individual in the table. The only way that setting to -1 would be unnecessary would be if every entry in p_tables->nodes corresponded to an entry in the individuals table anyway, in which case every -1 value would end up being overwritten in the loop below. But that is not the case here, as I understand it. No? Am I being dense? Maybe we need to talk about this in person.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the nodes in p_tables will correspond to individuals that were not remembered and are no longer extant, and so the p_tables->nodes->individual entries for those nodes need to have -1 set for their individual.
Ah, I see. If an individual was not remembered, then it will not appear in the individual table. Even if it was alive at the time of a previous output, it never would have been added to tables, only the temporary-tables-for-output.
| Ah, thanks. I'm moving discussion to the code, so we can keep the different thread separate. |
core/slim_sim.cpp Outdated
| | ||
| // add this individual to the individual table if not present already | ||
| // NOTE: this means if an individual is remembered more than once, subsequent times have no effect | ||
| // ... which I think is OK but should be noted in the documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that that is an issue at all; like, what would it even mean for it to be otherwise? Remembering an individual is like setting a flag, right? Once the flag is set, setting it again makes no difference; flags can't be "double-set". So in my opinion you can just delete that "NOTE:" comment entirely, I think it's obvious.
Recall that we store mutable things like age and location.
And, I think I disagree with my former self -- a more consistent behavior would be to store the state at the last time an individual was Remembered. For one thing, it's wierd for subsequent Remembers to have no effect; for another, if an individual is both Remembered and alive, then what gets output is the alive version, which is necessary to allow recovering from the saved state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I misunderstood what you were talking about here. So there is, we agree, no such thing as being "double-remembered" as such; the point is just that each time an individual is remembered its information in the table should be updated to the current value (or not). I agree with your current self: the information ought to be updated, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some time you should show me how you manage to associate discussion threads with particular places in the code of the pull request, that's nice. :->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In "files changed", hovering over a line of code makes a blue plus mark pop up; if so, when you click you can make a comment there.
| Hm, there is one way that the current scheme will not allow complete recovery of saved state: if an individual is both Remembered and alive, but had a different age when Remembered, then currently we have no way of passing on what their age was when Remembered. I don't see how to do this at all reasonably, though; maybe not a big deal? |
| Hmm. In some ways the natural way to preserve that information would be to write the remembered/extant individual to the individuals table twice, once in the remembered section with the remembered info, once in the extant section with the extant info; but of course that does not mesh well with the rest of the design, and is not worth pursuing. I agree that it is not a big deal. The user has no way, in SLiM, of accessing information about remembered state; all that is accessible in SLiM is current state, so that is all that needs to be saved/restored as far as SLiM is concerned. It is possible that someone would want to use rememberIndividuals() to specifically remember individual state at a particular moment in time, with the intention of using that information in analysis in Python, and that getting later information instead would be a problem for them. But I think it's reasonable for us to just say, well, that's not what rememberIndividuals() means. If they want that, they can (a) write a .trees file out at the point in time they're interested in, or (b) write out the information they want in an auxiliary file, at the appropriate point in time, rather than relying on the .trees file for it. So there are solutions for that situation. |
| And how would I have gotten my comment above to be nested under your comment, as a reply? You did that above, somehow, but I see no "reply" option attached to your comment, so I had to start a new unlinked comment. |
| There's been a fair number of upstream changes to tskit, and I've had to merge those in, too. But, it seems to be working, superficially. |
| } | ||
| | ||
| // our tables copy needs to have a population table to be able to sort it | ||
| WritePopulationTable(&tables); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change: tskit now checks that each node.population refers to a valid entry in the population table.
core/slim_sim.cpp Outdated
| | ||
| void SLiMSim::AddIndividualToTable(Individual *p_individual, table_collection_t *p_tables, bool p_addToRemembered) | ||
| { | ||
| // Maybe this should take a list of individuals instead of a single one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had to do the same thing in two places, so I factored this out.
Maybe it should take a vector of individuals? I wasn't sure how to get that in one place, though.
core/slim_sim.cpp Outdated
| | ||
| // Add in the mutation.parent information | ||
| ret = table_collection_build_indexes(&output_tables, 0); | ||
| if (ret < 0) handle_error("table_collection_build_indexes", ret); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a new requirement; previously, tskit wasn't being careful enough about when its indexes were invalidated.
core/slim_sim.cpp Outdated
| // This overwrites whatever might be previously in the individual table | ||
| individual_table_clear(p_tables->individuals); | ||
| // First prepend the existing remembered individuals | ||
| individual_table_copy(tables.individuals, p_tables->individuals); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, but I'm being dense. This function is called here, on output_tables, which has just been copied from tables. So these two lines (individual_table_clear and individual_table_copy) are unnecessary. What we want to do is add some more lines to the end of the existing individual table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this method should be called AddCurrentGenerationToIndividuals instead of WriteIndividuals.
| individual_table_clear(p_tables->individuals); | ||
| // First prepend the existing remembered individuals | ||
| individual_table_copy(tables.individuals, p_tables->individuals); | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is called here, on output_tables, which has just been copied from tables. So these two lines (individual_table_clear and individual_table_copy) were unnecessary. What we want to do is add some more lines to the end of the existing individual table.
Maybe this makes more clear that we don't need to erase the existing nodes.individual column?
| } | ||
| | ||
| void SLiMSim::WriteIndividualTable(table_collection_t *p_tables) | ||
| void SLiMSim::AddCurrentGenerationToIndividuals(table_collection_t *p_tables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now reflecting what this actually does
In the "discussion" section there's just one thread; that's why starting threads in the code is useful. |
63e3f4d to 68c71d1 Compare | This is good to go, I think. I've tidied up the commits somewhat. |
merged remember individuals
68c71d1 to e8b503f Compare
No description provided.