I guess I found a bug in the way the scoring is done.
For example a article page from cnn:
DEBUG:root:Candidate p#cnnContentContainer.cnn_storyarea with score 163.5
DEBUG:root:Candidate p#.cnn_contentarea with score 138.0
DEBUG:root:Candidate p#cnnContainer. with score 118.5
DEBUG:root:Candidate body#. with score 113.5
DEBUG:root:Candidate p#.cnn_strycntntlft with score 111.0
all of those 5 candidates are somehow childs of eachother (body#->p.*). So it happens, that the result is showing to much text which is not needed.
An idea would be to remove child nodes from the parent before calculating the score.