This is not the direct result of a knowledge cutoff date, but could be the result of mis-prompting or fine-tuning to enforce cut off dates to discourage hallucinations about future events.
But, Gemini/Bard has access to a massive index built from Google’s web crawling-- if it shows up in a Google search, Gemini/Bard can see it. So unless the model weights do not contain any features that correlate Gaza to being a geographic location, there should be no technical reason that it is unable to retrieve this information.
My speculation is that Google has set up “misinformation guardrails” that instruct the model not to present retrieved information that is deemed “dubious”-- it may decide for instance that information from an AP article are more reputable than sparse, potentially conflicting references to numbers given by the Gaza Health Ministry, since it is ran by the Palestinian Authority. I haven’t read too far into Gemini’s docs to know what all Google said they’ve done for misinformation guardrailing, but I expect they don’t tell us much besides that they obviously see a need to do it since misinformation is a thing, LLMs are gullible and prone to hallucinations and their model has access to literally all the information, disinformation, and misinformation on the surface web and then some.
TL;DR someone on the Ethics team is being lazy as usual and taking the simplest route to misinformation guardrailing because “move fast”. This guardrailing is necessary, but fucks up quite easily (ex. the accidentally racist image generator incident)
This is not the direct result of a knowledge cutoff date, but could be the result of mis-prompting or fine-tuning to enforce cut off dates to discourage hallucinations about future events.
But, Gemini/Bard has access to a massive index built from Google’s web crawling-- if it shows up in a Google search, Gemini/Bard can see it. So unless the model weights do not contain any features that correlate Gaza to being a geographic location, there should be no technical reason that it is unable to retrieve this information.
My speculation is that Google has set up “misinformation guardrails” that instruct the model not to present retrieved information that is deemed “dubious”-- it may decide for instance that information from an AP article are more reputable than sparse, potentially conflicting references to numbers given by the Gaza Health Ministry, since it is ran by the Palestinian Authority. I haven’t read too far into Gemini’s docs to know what all Google said they’ve done for misinformation guardrailing, but I expect they don’t tell us much besides that they obviously see a need to do it since misinformation is a thing, LLMs are gullible and prone to hallucinations and their model has access to literally all the information, disinformation, and misinformation on the surface web and then some.
TL;DR someone on the Ethics team is being lazy as usual and taking the simplest route to misinformation guardrailing because “move fast”. This guardrailing is necessary, but fucks up quite easily (ex. the accidentally racist image generator incident)