However, in RL we try to get by without the extensive and often hard-to-gather datasets. That's the more general approach to learning anyhow, if you think about how we humans tend to learn things. If you start playing a game like pong you would most likely just start playing and figure things out on the go. If you don't have any prior knowledge about how the game works or what the goals are, you will probably not be very good in the beginning, but the score will tell you if what you did was any good and so you can adapt taking into account this score (or reward). This reward is also what we're going to use for our agent when trying to adapt its behavior. Such a reward is usually quite easily obtainable (or even constructable) for many tasks.