Because plenty of times what a child will do is dependent on what the parent was...

Because plenty of times what a child will do is dependent on what the parent was doing just before the fork, and in fact may simply be a bit of code to run a background task related to the parents foreground task.

Also, and this is a very important bit I think, fork started out before 'threads' were common, so another process to run the same code was a common solution. The communication between parent and child was through a unix pipe. That way you could write one single program, with all the state shared between the two sides of the fork, so both parties have access to all the context.

The copy-on-write bit set on all the pages with state in them in the child guarantees that fork is very fast and pushes the copying of the state as far in to the future as it can get away with. So forking a process with 10M resident is as fast as forking a process with only 100K resident. When you modify the memory in the child you get to pay 'bit-by-bit' for the cost of the cloning of the parent, but never more than you actually need.

Clever programmers make sure that the state variables that are going to be modified by the child live close to each other.

An alternative to that is to use shared memory and mutexes, that way you can get pretty close to the 'threaded' model using only processes.