Still, it’s perhaps best known for its distribution of entire copies of Wikipedia in areas of low bandwidth, like Cuba. In the eleven years since being invented, a number of organizations have utilized it, including World Possible and Internet in a Box. In her first conversation for the Wikimedia Blog, Anne chats with Emmanuel Engelhart (aka “Kelson”), a developer who works on Kiwix, an open source software which allows users to download web content for offline reading. Over the coming months, Anne will be interviewing people who work to remove access barriers for people across the world. One of her areas of interest is offline access, as she works with the New Readers team to improve the way people who have limited or infrequent access to the Internet can access free and open knowledge. Here is the coordination page.Senior Program Manager Anne Gomez leads the New Readers initiative, where she works on ways to better understand barriers that prevent people around the world from accessing information online. One of the problem is that even on Gutenberg, we don't have all the most important books of the French litterature. Generate zimwriterfs-friendly folder of static HTML files based on templates and list of books.Generate a static folder repository of all ePUB files.Download the books based on filters (formats, languages).Query the database to reflect filters and get list of books.Loop through folder/files and parse RDF.Git clone git://.net/p/kiwix/other kiwix-other Sudo apt-get install libzim-dev liblzma-dev libmagic-dev autoconf automake The best Goobuntu packaged option seems to be: If you can somehow filter which books to fetch (language-only, book-range), that will be convenient So a on-disk-caching, robots-obeying url-retriever needs to be made/reused. So a caching fetch-by-url seems more convenient, the rdf-file contains the timestamp, which could be compared so updates to a book will be caught. To get epub+text+html, you'll need both rsync-trees, which seems quite inconvenient. If I cd gutenberg-generated, there is stuff like: Rsync -av -del /var/www/gutenberg-generated Gutenberg supports rsync ( rsync -av -del /var/That was source, the generated data: Wget works, contains 30k directories with each an rdf-file: every directory has 1 file with the rdf-description of one book.Įmmanuel suggests the scraper should download everything into one dir, then converting the data into an output dir, then zim-ifying that directory. Work done by didier chez and cniekel chez
0 Comments
Leave a Reply. |