Notes on Learning Math

I think I'm starting to get a sense of how to learn math

Recursively Wrong

Jul 30, 2023

(Context: Written to my maths tutor while racking my brains over real analysis and proofs)

Previously, my algorithm was like this:

I'm a math person, so I should learn math
I should learn from a textbook
Just thinking about it is too hard, never mind
Perhaps an online course/Youtube video series will be easier
Week 1: This is going well! I'm learning so much!
Week 3: I'm completely lost, this is a massive pain, but I need to tough it out
Week 4: Give up

After starting lessons with you, my algorithm was like this:

Lessons are fun!
Even if I've not reviewed anything or done any homework, I still make progress every week (even if more slowly), this creates a positive feedback loop
Continuous progress mean I don't give up as a whole (though realistically I still have to give up on certain theorems and such if I don't have enough time/motivation 😜)
I'm not getting stuck since I can ask questions, this prevents a negative feedback loop
Sometimes I get motivated enough to study certain ideas/theorems in depth and it's really nice to get positive feedback for them! Which motivates me to do more in-depth studies, this creates a positive feedback loop

But I still had certain bottlenecks

The pernicious philosophical WHY?! (The why of everything vs the why of particular things, the latter is useful, the former not so much)
- I think I've sort of moved on from this, in the same way I've moved on from questions like "What is the meaning of life? 🤔", it presupposes something that isn't there
- Previously, I've also described maths as a set of self-consistent rules, metaphorically, "Maths as a Video Game"
- But now, I think the better way to describe it is an iterative/explorative process, with certain assumptions/premises, then logically deduce new premises, metaphorically, "Maths as Advanced Logic"
- Modelling maths as an explorative process implies, I think correctly, that there is an infinitely vast network of nodes (or even disconnected networks of nodes), only a tiny fraction of which are known/seen
- This is what I think the "philosophical why" is getting at. Why do we have this configuration of nodes as opposed to some other, potentially more optimal, configuration? What else is out there, in the nodes we cannot reach? But it's inherently silly to try to reason about unknown things
- I think what made me come to this realisation is this really amazing video about infinities through the lens of finding a number which matched the duration of ball falling a certain height. The video has a brilliant quote about how "We can't find the hay in the haystack":

Repetition
- On one level, forgetting things is inevitable
- My understanding of how memory works is, it's not possible to store everything, so the brain automatically clears out the file drawers after a while
- But it's not efficient to learn everything from scratch either, so every time it clears out the file drawers it quickly marks down identifiers of files (no idea how it does this, but we can point to how the immune system recognises viruses it's seen before to guess it's possible)
- The next time the brain identifies something it's seen before, it stores it for longer period before clearing, which is why Spaced Repetition and Anki/flashcards work, on some level
- I think this is a bit of a weakness in our lessons, I think there is an invisible assumption that if I'm paying for a lesson I should be taught new things, but if it's hard enough that I failed to learn it properly the first time, there's no real reason I can then learn it (or at least efficiently) on my own just because I have more notes
- This is mitigated somewhat by you recapping things I've forgotten on demand 😆, and by going through related topics, but for things like Cantor's Paradox, I'm way more interested in re-learning it now after watching the video mentioned above
- Though, I probably wouldn't have clicked the video/grokked it at a conceptual level if we hadn't had that lesson on infinities first, which goes back to the point of learning being an iterative process
"Concept Size"
- One problem with Anki is that flashcards are built around quick review of small concepts. It works exceptionally well in some cases, like language learning, where the units of learning (single words) are very small.
- But for more complicated concepts, with ten different sub-concepts, either you get some wrong and mark it correct anyway (which means you don't fix your mistakes), or you mark it wrong over and over as you try and fail to remember all ten sub-concepts correctly at the same time
- Breaking one card into smaller cards for each sub-concept means the link between the sub-concepts is lost, and the flashcards become extremely unwieldy
- The root cause of the problem is probably attempting to store every nitpicky detail, which is impossible because memory is finite (or at least memorising has a finite throughput, there's no reason to believe memory has finite storage 🤔). It's significantly more efficient to throw away 999/1000 things to learn rather than learn 1000 things 2x faster somehow
- Visualisation is extremely underused. Visualisation throws away a lot of unnecessary detail (reducing memory requirements) and visual information is much easier to learn and remember (Humans are hardwired to process visual information quickly)
- I suspect it's because visualisation in maths is regarded as a crutch, and the belief is that these crutches will have to be discarded at a later stage. But 3Blue1Brown (and other maths Youtubers who followed him) have successfully visualised otherwise complicated concepts
- From an engineering perspective, it's probably much cheaper to remember a visualisation and then derive/compute the definition from the visualisation rather than to remember the definition by itself (A real example might be, it's much faster to download a compressed file and unzip it, than to download an uncompressed folder)
Caching
- I'm going to use computer memory as a metaphor for human memory:
  - There's a few different types of storage used in computers because there's a tradeoff between access speed and dollar cost. So the fastest to access storage has the smallest size to minimise cost
  - Cache memory -> RAM, or Rapid Access Memory -> HD or SSDs -> External storage
  - For maximum efficiency, the results of the most frequently used/most expensive computations are stored in the fastest to access storage
  - When a result needs to be looked up, and it can be found in the cache, it's a cache hit. Else, it's a cache miss, and the result needs to be looked up in slower storage. It's possible that accessing the storage is so slow it's faster to recompute the result
  - With learning, there's the storage of the brain, the storage of (searchable) notes, the storage of specific sources e.g. Wikipedia, Wolfram Alpha, Textbooks, and the storage of everything i.e. Google search. I've always tended to just use my brain and Google, which doesn't always work very well 🙃
  - There's also the concept of "warming up" a "cold" cache by pre-loading data before use
- I asked for a list of definitions and textbooks a couple weeks ago, because I realised I was wasting a lot of time looking up stuff (or procrastinating on looking up stuff because I anticipated it would be a lot of work)
- Textbooks were extremely scary to me, because I thought it was just a giant pile of too many things to learn
- But if I Google a theorem, or search it up on Youtube, I very often get hit with a wall of maths, so I have to spend quite a bit of time just figuring out what is relevant if I figure it out at all
- This is where I'm hoping textbooks come in
  - Rather than textbooks being a pile of things I have to learn, I think it makes more sense to use it as a dictionary
  - It's silly to try to memorise a textbook, just as it's silly to try to memorise a dictionary, so I don't need to stress myself out thinking I have to learn everything
- I'm trying out opening all the textbooks in a PDF reader and using the reader's search function to look for definitions in all textbooks at once. I've only just tried it out for Carathéodory's Theorem, and it seems to work well
- Framing learning as an efficient memory management problem also implies that I need to be careful about which theorems I store in my brain
- Some are more useful than others in proving subsequent things, so I want them in my brain for quick access. I can strategically use flashcards or just do recall practice to "warm" the cache
- The rest I can store in my notes (with or without proof) and access as needed

Carathéodory's Theorem

I've tried learning Carathéodory's Theorem with the above in mind, and I think it works quite well!
First, find resources, open Miro, textbooks, Youtube, etc. for quick lookup
Second, I always had trouble with definitions because I don't think I properly understood what they meant, so I tried making sure I understood the terms
- (Definitions)
- f: I -> R, c ∈ I
- (Premises)
- f is differentiable at c
- (Fetch the definition of differentiable, and other related concepts)
  - lim x->c (f(x) - f(c)) / (x - c)
  - Differentiable => Continuous
- IFF
- (Remember that Maths is just logic, we're not trying to prove either A or B in some general way, just if A => B and if B => A)
- ∃ φ: I -> R, φ is continuous
- (Fetch the definition of continuous, and other related concepts)
  - ∀ ε > 0 ∃ δ > 0 s.t. ∀ x, If |x - c| < δ, Then |f(x) - f(c)| < ε
  - Convergent => Continuous
  - Convergent => Bounded
- f(x) - f(c) = φ(x) * (x - c)
Third, visualise the functions - f(x) and f'(x) are straightforward, φ(x) is the chords of f(x), as you drew previously
Fourth, attempt to prove
- Prove =>
- (Premise) f is differentiable at c
- Let φ(x) =
  - (f(x) - f(c)) / (x - c) if x != c
  - f'(x) if x = c
- (This arbitrary definition works because we only have to prove ∃ φ)
- φ is continuous because all term in definition of φ is continuous ✅
- If x = c
  - f(x) - f(c) = φ(x)(x - c) is 0 = 0 ✅
- If x != c
  - (x - c) != 0
  - From definition of φ(x), we can just kick (x - c) over
  - (f(x) - f(c)) = φ(x) * (x - c) ✅
- Prove <=
- (Premise) φ is continuous, f(x) - f(c) = φ(x)(x - c), (x - c) != 0
- Kick (x - c) over. (f(x) - f(c)) / (x - c) = φ(x)
- ⚠️ The textbook then takes the limit of both sides and claims this is possible because φ is continuous, but this doesn't seem quite right 🤔 We've already seen that Continuous !=> Convergent, and we need convergence for a limit to exist ⚠️
- But if we can apply the limit we get f'(c) = φ(c), as expected
Fifth, attempt to apply
- Prove chain rule with Carathéodory's Theorem
- Define Carathéodory's Theorem and chain rule (Will skip typing it out since it's a pain, but I wrote it on paper)
- Important bit is f and g are differentiable, which gives us the following:
  - 1) f(x) - f(c) = φ(x) (x - c)
  - 2) g(f(x)) - g(f(c)) = ψ(f(x)) (f(x) - f(c))
  - 3) f'(c) = φ(c)
  - 4) g'(f(c)) = ψ(f(c))
  - from 1) and 2), g(f(x)) - g(f(c)) = ψ(f(x)) φ(x) (x - c)
  - Kick the (x - c) over and take the limit of both sides
  - derivative of composition = ψ(f(c)) φ(c) = g'(f(c)) f'(c) ✅

Process Over Outcome

Discussion about this post