The generalized Zollman effect says that the variance in probabilistic evidence will manifest as either differences between agents of a community (social variance) or unpredictability for the community's aggregate outcome (modal variance), and this depends on the level of correlation between their pools of evidence.
So when agents share their evidence with more of their community, there is less social variance (the community is more homogenous), but it will also be less predictable. When agents share their evidence with less of the community, there is greater social variance, but the community is as a whole more predictable.
The data splitter app to the right demonstrates this by simulating a community of agents that observe probabilistically generated data. Each agent produces their own set of data concerning some random variable (like doctors determining the success rate of a particular medical treatment).
Each agent's data is shown in the k=0 column, which captures the community when no one shares any of their data. Higher values of k indicate that agents share their evidence with more of their peers. (Note, a higher connectivity means higher correlation between the pools of evidence.)
The last row shows the number of agents whose observed data matches the fact that the true success rate is above 0.5 (called data-lucky).
The figure shows the performance across multiple runs (batches of data). Each data point captures the number of data-lucky agents for a given level of connectivity for one run of the simulation.
Notice that the higher the connectivity, the closer the community is to either extreme (no one or everyone is data-lucky), because there is less room for social variance. Likewise there is a greater difference from one run to the next (and between the lowest and highest performing runs).
Conversely, lower connectivity leads to greater homogeneity across modal space (between each run). But within each run, the communities are closer to the midpoint; the communities are closer to being evenly split.
Random samples are generated based on a the success rate:
0.5+ε
That means lower ε values make the problem harder.
Each agent generates their own set of random samples.
More trials per agent means more total evidence.
Each additional agent means more evidence but also more possible connections.
The table captures the total evidence an agent sees given a particular levels of connectivity.
If we imagine all of the agents in a ring, the k indicates the number of adjacent agents on either side that an agent is connected to. So when k=1, an agent has 3 connections, when k=2, an agent has 5 connections, and so on.
So each column shows the same set of evidence distributed differently. Higher values of k mean there is more overlap (correlation) between agents' pools of evidence.
The true success rate of the random variable is above 0.5, and so data-lucky agents are those who observe a success rate above 0.5. The last row of the table shows the total number of data-lucky agents for each level of connectivity.
The figure shows the performance across multiple runs (batches of data). For each new batch of data, there is a new data point for each level of connectivity according to how many agents are data-lucky.
The data points are ordered according to the number of data-lucky agents, so the runs with fewer data-lucky agents are further to the left, and runs with more data-lucky agents are on the right.