summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2017-04-19more consistent crawl frequencyMatt Singleton
2017-04-19add favicons and write intermediate images to scratchMatt Singleton
2017-04-19fix guardian images and image scalingMatt Singleton
2017-04-17replace print statements with the logging moduleMatt Singleton
2017-04-17pull the images locally and resizeMatt Singleton
2017-04-17clean up template formattingMatt Singleton
2017-04-17use jinja templates to build the outputMatt Singleton
2017-04-17don't need scratch dir anymoreMatt Singleton
2017-04-17fix fox urlsMatt Singleton
2017-04-17read the scratch dir path on the command lineMatt Singleton
2017-04-17get it to run from the packageMatt Singleton
2017-04-16move files around for packaging reasonsMatt Singleton
2017-04-16take webroot as a command line argumentMatt Singleton
2017-03-30Changed CSM to their /USA pagessstvinc2
2017-03-27minor fixesssstvinc2
2017-03-24Added ABC News, some parser fixes as wellssstvinc2
2017-03-23added washington timesssstvinc2
2017-03-23Fixed H1 parsing for None types. Should resolve further crashes.ssstvinc2
2017-03-23reworked main loop to hopefully prevent crashingssstvinc2
2017-03-10Made removeBadStories modular and reduced crashes by checking for nonetypesssstvinc2
2017-03-09reframed images to crop off Guardian's bottom blue linessstvinc2
2017-03-09Reworked Guardianssstvinc2
2017-03-06Added spotCheck ability. Other minor tweaksssstvinc2
2017-02-24minor tweaks, re-enabled NYT and GDNssstvinc2
2017-02-19file cleanupssstvinc2
2017-02-19fixed issue where links would also pop open a new instance of homepagessstvinc2
2017-02-19file cleanupssstvinc2
2017-02-19Fixed bounding box on h1sssstvinc2
2017-02-19Print output formattingsstvinc2
2017-02-18Added The Hill; also tweaked buildArticle()sstvinc2
2017-02-18Fixed NPR parsing; put NYT back in; Mobile CSSsstvinc2
2017-02-16Added NPRsstvinc2
2017-02-16Grace made it prettiersstvinc2
2017-02-16Added in title checks for article removalsstvinc2
2017-02-16Some parsing tweaks, mostly for The Guardiansstvinc2
2017-02-16Added The Guardian to sourcessstvinc2
2017-02-16More parsing fixes, more bad article flaggingsstvinc2
2017-02-16Pulled NYT again; minor fixes for NBC, Blazesstvinc2
2017-02-15Fixed NYT, plus other parsing fixes and a minor visual tweaksstvinc2
2017-02-15Changed randomization algorith for H2 and H3; fully implemented H3sstvinc2
2017-02-15The Blaze added to new parser; also fixed Blaze desription fieldssstvinc2
2017-02-15Added CBS to new parsersstvinc2
2017-02-15Added NBC to new parsersstvinc2
2017-02-15Added BBC to new parsersstvinc2
2017-02-15Added author name to Article class; added removeBadArticles() filter functionsstvinc2
2017-02-14Weekly Standard now uses new parsersstvinc2
2017-02-14modularized code a bit, and added Fox News with new parsersstvinc2
2017-02-14New parsing method startedsstvinc2
2017-02-13Better URL obfuscation methodsstvinc2
2017-02-13New layout implemented, plus a CBS fixsstvinc2