Complete Digitization of Leonardo da Vinci's Codex Atlanticus

(openculture.com)

121 points | by emmelaich 16 hours ago ago

38 comments

  • felixbraun 19 minutes ago ago

    Excellent work—this reminds me of similar projects built with the same tech stack:

    – Coins: A journey through the Münzkabinett Berlin collection (one of the largest in the world). https://uclab.fh-potsdam.de/coins/

    – Theodor Fontane Marginalia: A visualization of Fontane’s marginalia and notes in his personal library. https://uclab.fh-potsdam.de/ff/

  • WillAdams 7 hours ago ago

    Interesting UI --- wants a full-screen mode and 2-up view and a way to remove all the chrome/UI....

    An earlier example of this sort of thing was Bill Gates' purchase of the Codex Leceister https://en.wikipedia.org/wiki/Codex_Leicester which was then digitized and released on a CD-ROM by Corbis:

    https://en.wikipedia.org/wiki/Leonardo_da_Vinci_(video_game)

    which was quite engaging, but sadly trapped in the technology of the time --- anyone know of an updated version of it?

  • mzs 5 hours ago ago
  • Isamu 8 hours ago ago

    If you get an opportunity to see them in person, it’s worth it because the fine details are that much more impressive up close. Every photo I’ve seen is not as good. Also the illustration is tinier than you would think.

  • whatever1 10 hours ago ago

    How much talent can fit in a person? This is how much.

    • nunodonato 8 hours ago ago

      indeed! The biography of Leonardo was an amazing read. Highly recommend it

      • proee 6 hours ago ago

        Can you recommend the author?

        • nunodonato 5 hours ago ago

          https://en.wikipedia.org/wiki/Leonardo_da_Vinci_(Isaacson_bo...

          The same author who wrote some other famous biographies. I know some people prefer other DaVinci's biographies. I didn't read others to be able to compare, but I really enjoyed this one.

          • kragen 3 hours ago ago

            Nitpick: "da Vinci" wasn't our homeboy's name. That just means "from Vinci". He was "Leonardo", like many other people, so we added "da Vinci" to clarify which Leonardo we meant, just like you might say, "Jessica from church came by," to clarify that you didn't mean Jessica the ex-girlfriend. Surnames weren't very widely used in Italy then.

            It's like "Jesus of Nazareth"; you wouldn't talk about "other OfNazareth's biographies". Ain't grammatical.

            • cma 2 hours ago ago

              It's fine. John Smith once meant the John who works as a blacksmith etc. Whatever the original meaning we now widely take da Vinci to be the last name if we don't speak Italian.

              • kragen an hour ago ago

                I agree that the error is common. Try to make new errors instead of repeating common errors.

                • card_zero 17 minutes ago ago

                  Does this also apply to DiCaprio? His name seems to translate as "the deer's Leonardo", or maybe "the goat's Leonardo". Possibly "son of a goat".

                  Wikipedia says that Leonardo da Vinci was properly Leonardo son of Piero from Vinci son of Antonio son of another Piero son of Guido. I'm not sure that moving to surnames was a mistake, you know.

                  • kragen 8 minutes ago ago

                    Nope, that's his actual surname. He wasn't born in the 16th century.

  • kragen 13 hours ago ago

    This is beautiful! I am having some difficulty with the UI; is there a torrent? Images like https://codex-atlanticus.ambrosiana.it/assets/500/000R-1.jpg are too low in resolution for good archival; you can't even read the writing.

    • trvz 13 hours ago ago

      Manipulate the URL for a higher resolution:

        https://codex-atlanticus.ambrosiana.it/assets/2000/000R-1.jpg
      
      You don't need to depend on others to create a torrent, as bestowed upon you was the power of wget!

        wget https://codex-atlanticus.ambrosiana.it/assets/2000/000R-{1..1119}.jpg
        wget https://codex-atlanticus.ambrosiana.it/assets/2000/000V-{1..1119}.jpg
      • kragen 4 hours ago ago

        Thanks! On my cellphone not even enough of the UI was working for me to discover those URLs. I suspect a certain amount of error recovery is in order for wgetting all 2238 images. 2000 seems to be the maximum resolution available, which is under 100dpi. A few of the images seem to have been uploaded to https://commons.wikimedia.org/wiki/Category:Codex_Atlanticus.

        There are a couple of scans of a 43-page Italian edition published by Ulrico Hoepli on the Archive: https://archive.org/details/codex-atlanticus-leonardo-da-vin... https://archive.org/details/codex-atlanticus-leonardo-da-vin... but they seem to be of very poor quality.

        I'm done downloading now (with a sleep of 1 second between pages), and I have 1064125470 bytes of JPEG files, a very reasonably torrentable size. I'll see if I can put together a torrent and upload to the Archive and Commons...

      • WithinReason 9 hours ago ago

        Or in PowerShell on Windows:

          1..1119 | % { iwr "https://codex-atlanticus.ambrosiana.it/assets/2000/000R-$_.jpg" -OutFile "000R-$_.jpg" }
          1..1119 | % { iwr "https://codex-atlanticus.ambrosiana.it/assets/2000/000V-$_.jpg" -OutFile "000V-$_.jpg" }
        • embedding-shape 5 hours ago ago

          Some people around me swear PowerShell has better user experience than unix shells, but then I keep seeing examples like these. How on earth could people prefer this compared to `wget https://codex-atlanticus.ambrosiana.it/assets/2000/000V-{1.....`?

          • kragen 4 hours ago ago

            In this case presumably the main difference is not PowerShell vs. bash but iwr vs. wget? Because I think this is roughly equally bad (untested):

                for page in {1..1119}; do
                    iwr "https://codex-atlanticus.ambrosiana.it/assets/2000/000R-$page.jpg" -OutFile "000R-$page.jpg"
                    iwr "https://codex-atlanticus.ambrosiana.it/assets/2000/000V-$page.jpg" -OutFile "000V-$page.jpg"
                done
            
            Also until recently bash didn't have {42..53} syntax. You had to use `seq`. There was an alternative name for `seq` in Unix Power Tools, `jot`, because it wasn't standard: https://docstore.mik.ua/orelly/unix/upt/ch45_11.htm. This section was by ORA author and sysadmin Linda Mui (https://www.oreilly.com/pub/au/268), but I don't know if she wrote `jot` or just popularized it.
      • NoMoreNicksLeft 12 hours ago ago

        Any idea on how to best compile it to an ebook? Just stuffing the jpgs into a pdf rarely works well...

        • ticulatedspline an hour ago ago

          Easy way would be to just drop them in a zip and label it .cbz. Most readers handle CBR/CBZ just fine.

          • kragen 8 minutes ago ago

            Oh, is .cbz that simple? Does it use the file order of the zipfile members or some other order?

            It may be useful to use zip -Z store. JPEG data isn't going to get much benefit from another layer of LZ77.

        • foofoo12 9 hours ago ago

          I usually do what rarely doesn't work well for you, but it works decently for me. You get 1 page per image and the image isn't compressed or touched at all.

            apt install img2pdf
            img2pdf *.jpg -o leonardo-da-book.pdf
          • nunodonato 9 hours ago ago

            wouldnt this mess up the order? I think you are supposed to view it like R1, V2, R2, V2, etc

            • foofoo12 8 hours ago ago

              Yes, this was just an example. Using wildcard expansion will give you whatever order the your current shell seems fit. Bash does alphabetical order.

              • kragen 3 hours ago ago

                More like

                    echo $(for page in {1..1119}; do for side in R V; do
                      echo "000$side-$page.jpg"; done; done)
        • c0balt 12 hours ago ago

          I haven't that done this in some time, but templating some markdown code for pandoc and creating an ebup might be a viable avenue.

          • kragen 3 hours ago ago

            Maybe what rarely works well for NoMoreNicksLeft is having a gigabyte of JPEGs in a single HTML chapter inside the epub? In that case you could do something like divide the files into 373 "chapters" of 6 pages each?

            One of the fragmentary editions I linked on the Archive uses the .cbr Comic Book Reader format; perhaps that is a better format than .epub for high-resolution scans of every page?

          • NoMoreNicksLeft 6 hours ago ago

            Oooh... I have even less luck with epub, when the pages are an image-per-page.

        • atoav 7 hours ago ago

          Calibre comes with a ebook-convert command, that one might work

        • eMPee584 10 hours ago ago

          ocrmypdf (rocks!)

  • nunodonato 9 hours ago ago

    amazing! The categorization is nice, but I would love to see some sort of "tag cloud" that would allow use to view more specific content. How long until someone creates a tool to RAG the hell out of this? :)

  • vim-guru 11 hours ago ago

    Why are some of the pages upside down?

    • embedding-shape 5 hours ago ago

      It's a bit bananas, but probably just because he could. He also wrote his personal notes in "mirror writing":

      > The notes on Leonardo da Vinci's famous Vitruvian Man image are in mirror writing. Leonardo da Vinci wrote most of his personal notes in mirror writing, only using standard writing if he intended his texts to be read by others

      https://en.wikipedia.org/wiki/Mirror_writing

    • foofoo12 9 hours ago ago

      Da Vinci was showing off.

    • b34k3r 10 hours ago ago

      just rotate your monitor

  • MangoToupe 3 hours ago ago

    > We use it to express mild surprise that one person could use both their left and right hemispheres equally well.

    When did this myth become so perpetuated? It's infuriating. I blame university administration. I can't think of any other reason to so firmly distinguish different areas of thought.

  • NaomiLehman 2 hours ago ago

    I'm training a model based on this /s