Some things I learned in 2023
Another list, now of things I learned in the past months.
Python parallelism
type | memory | start | run | kernel | obs |
---|---|---|---|---|---|
coroutines | very low | very fast | slow | main thread | Cooperative non-preemptive concurrency. Using yield in generators or async/await (or greenlets?) tasks pass control between themselves. Could be run with threads. |
threads | low | fast | slow | kernel threads | Preemptive concurrency. Due to GIL, not real multitasking (parallelism). Same memory context. Like coroutines useful only for non-blocking I/O (network or disk) or non-blocking external function calls. |
subinterpreters | large | slow | fast | kernel threads | Full interpreters with their own GIL. Capable of true multitasking. Communicates through serialized (pickle) messages (objects). In principle capable to access shared memory. Very new, maybe preparing a no-GIL future. |
processes | very large | very slow | fast | processes | Traditional multitasking solution for CPU bound tasks. |
As a comparison virtual threads for Java are the equivalent of coroutines in Java. They allow for low memory, fast to spin-up preemptive multitasking. They are transparently assigned by JVM to pre-spun kernel threads.
What is the deal with Cassandra?
Simply put, best distributed DB. Multipurpose, very scalable, very resilient, open source, SQL compatible, high read and write. It is a columnar DB which allows for compression of sparse datasets and speed-up of some searches. There is a price though.
- No strict ACID. One can tune for the balance between availability (AP) and consistency (CP) depending on the use case.
- It is not relational. The columnar structure does not allow for foreign keys or joins, and many SQL queries are impossible or very slow.
- Data scheme must be carefully chosen (using de-normalization and duplications) in order to allow for the needed usage (queries).
Number of (genetic)ancestors
If one goes \(n\) generations in the past, one has, as a first approximation, \(2^n\) ancestors. This has to break down for times earlier than about 1500CE , because then the number of ancestors becomes larger than the world population. The consequence is that, a european living now, has most of the people living in 1400 in Europe as ancestors. This takes into account that the parents must be some kind of cousins (hopefully more than 5-6 generations removed) and many of the hypothetical \(2^n\) ancestors are the same person. The exact calculation takes into account some migration patterns. About 20% of people living in 1400 have no descendants today. Therefore, I can count as ancestors members of most medieval noble houses (which can be easily verified to have descendants today).
Maybe surprising, I have no genetic relation with most of my ancestors! Indeed, let us say that each child gets half of her DNA from the mother and half from the father. If each half passes as one stretch then she must have DNA from either the maternal grandfather or the maternal grandmother, but not both. Of course, things are more complicated as there are 22 autosomal chromosomes, and there is recombination too, which ensures that her genome has DNA fragments independently sampled from all the grandparents. But even so, looking at ancestors \(n\) generations removed there are about \(K=2(22+33*(n-1))\) ancestors that could have contributed DNA fragments to her genome. So, before 1800, the number of fragments is smaller than the number of ancestors of someone living now. Already, probably one of her grand-grand-grand-grand parents (5 generations removed) did not contribute to her genome. I myself have no known ancestors of noble blood but many people can find one 19’th century noble in their genealogical tree. Unfortunately this is not enough to claim noble blood on genetic bases.
Also, about 30 generations ago, most people in Europe are ancestors, but only 2000 of them contribute gentically.