-
Notifications
You must be signed in to change notification settings - Fork 172
Open
Description
Thanks for your work on this neat gem.
Running readability on the HTML from https://100wordstory.org/submit/, I expected more markup to remain than readability leaves intact.
Expected
Observed
In the screenshot above, the following content is stripped out:
- the red "Submit" heading:
<h1 class="titles">
<a href="https://100wordstory.org/submit/" rel="bookmark" title="SubmitPermanent Link to ">Submit</a>
</h1>- the red "Submissions are now open through January 9, 2024" and "Submit!" headings and links:
<h2 style="text-align: center;"><a href="https://100wordstory.submittable.com/submit">Submissions are now open through January 9, 2024!</a></h2>
<h2 style="text-align: center;"><a href="https://100wordstory.submittable.com/submit">Submit!</a></h2>Turning on debug: true doesn't seem to cite why these items are missing:
% readability -d https://100wordstory.org/submit/
/Users/avk/.rvm/gems/ruby-2.7.8@wbm/gems/ruby-readability-0.7.0/bin/readability:31: warning: calling URI.open via Kernel#open is deprecated, call URI.open directly or use URI#open
Removing unlikely candidate - magnific_popup-css
Removing unlikely candidate - nav superfishmenu-100-word-story-menu
Removing unlikely candidate - menu-item menu-item-type-post_type menu-item-object-page menu-item-73menu-item-73
Removing unlikely candidate - menu-item menu-item-type-post_type menu-item-object-page current-menu-item page_item page-item-6 current_page_item menu-item-72menu-item-72
Removing unlikely candidate - menu-item menu-item-type-post_type menu-item-object-page menu-item-83menu-item-83
Removing unlikely candidate - menu-item menu-item-type-post_type menu-item-object-page menu-item-189menu-item-189
Removing unlikely candidate - menu-item menu-item-type-post_type menu-item-object-page menu-item-70menu-item-70
Removing unlikely candidate - header
Removing unlikely candidate - comments
Removing unlikely candidate - commentlist clearfix
Removing unlikely candidate - comment even thread-even depth-1 parentcomment-65
Removing unlikely candidate - comment-author vcard
Removing unlikely candidate - comment-meta commentmetadata
Removing unlikely candidate - comment byuser comment-author-100words bypostauthor odd alt depth-2comment-66
Removing unlikely candidate - comment-author vcard
Removing unlikely candidate - comment-meta commentmetadata
Removing unlikely candidate - comment byuser comment-author-100words bypostauthor even thread-odd thread-alt depth-1comment-57
Removing unlikely candidate - comment-author vcard
Removing unlikely candidate - comment-meta commentmetadata
Removing unlikely candidate - comment odd alt thread-even depth-1comment-56
Removing unlikely candidate - comment-author vcard
Removing unlikely candidate - comment-meta commentmetadata
Removing unlikely candidate - comment even thread-odd thread-alt depth-1comment-52
Removing unlikely candidate - comment-author vcard
Removing unlikely candidate - comment-meta commentmetadata
Removing unlikely candidate - sidebar-wrapper
Removing unlikely candidate - sidebar
Removing unlikely candidate - sidebar-box widget_blockblock-3
Removing unlikely candidate - widget_text sidebar-box widget_custom_htmlcustom_html-2
Removing unlikely candidate - sidebar-box widget_texttext-3
Removing unlikely candidate - sidebar-box widget_texttext-4
Removing unlikely candidate - sidebar-box widget_texttext-7
Removing unlikely candidate - sidebar-box widget_linkslinkcat-10
Removing unlikely candidate - footer
Altering div(#pages.) to p
Altering div(#.) to p
Altering div(#.) to p
Altering div(#.) to p
Altering div(#.) to p
Altering div(#.) to p
Altering div(#.) to p
Altering div(#.) to p
Top 5 candidates:
Candidate div#.post-wrapper with score 51.935052531041066
Candidate div#left-div. with score 16.71186440677966
Best candidate div#.post-wrapper with score 51.935052531041066
Conditionally cleaned div#.addtoany_share_save_container addtoany_content addtoany_content_bottom with weight 25 and content score 0 because it has too short a content length without a single image.
Conditionally cleaned div#.a2a_kit a2a_kit_size_24 addtoany_list with weight 0 and content score 0 because it has too short a content length without a single image.
Conditionally cleaned div#.recentposts with weight 25 and content score 0 because it has too short a content length without a single image.
<div><div>
<p>100 words for your story … no more or no less. Tell a story, pen a slice of your memoir, or try your hand at an essay.</p>
<p>You get 100 words—exactly 100 words—which is both the pain and the pleasure here. It’s short, you tell yourself. You could write 100 words at a bus stop, on your lunch break, in your sleep. But with 100 words you must tell the whole story in its entirety, so it holds together like a perfect little doll house. (Your title is not part of the 100 words.)</p>
<p>Please include a short bio (25 words, max!) with your submission. Also, did we say exactly 100 words? We weren’t kidding! We count words according to Microsoft Word’s word-count tally. Also, make friends with your spell-check, or have a friend proofread your story.</p>
<p>We currently charge a $2 submission fee, the minimum in order to cover the costs of the submission system.</p>
<p> </p>
<p> </p>
</div></div>
Any ideas on how to broaden or include this content?
Metadata
Metadata
Assignees
Labels
No labels