I have a site with dynamic pages on them (e.g. record.aspx?id=657). These get added to occasionally. Each of these pages can be found by following various tags, and there's a page (ip restricted so only the google mini can see it) that list all the records as links.
The mini is set up to do a full crawl every night at 3am.
Some of the most recent additions to the database don't seem to be showing up, they're a couple of weeks old now, and even show up in Google's main index.
Any ideas why the new records arn't being indexed?
-
Is there anything about that in the Mini's logs? and if I remember, there's a place you can check if the mini can access a file or not. lso make sure that the concerned filetypes aren't being ignored from the config.
Kirschstein : The file types definitely aren't being ignored because there are plenty of them already in the index. Can you remember exactly where you check to see if the mini can access a file?vn : https://gewglemini:8443/EnterpriseController?actionType=contentDiagnostics&sort=crawled You could also use https://gewglemini:8443/EnterpriseController?actionType=networkSettings to check out if your gmini can reach that page! Hope this helps.From vn -
So, just a thought, but have you exceeded the capacity of the device?
From warren -
Try this:
- Check if you're not exceeding the license and capacity of the device (Status and Reports > Crawl Status Page).
- Check the values of "URLs Found That Match Crawl Patterns" and "Total Documents Being Served" . The difference between them must be small.
- Check the crawl information of the domain (Status and Reports >Crawl Diagnostics) to see if all the pages are being indexed.
- If you have access to the log files of your web server, force a recrawl of some page not indexed and see what is the response of the web server (maybe a 404?).
- After all the tests above, if all the pages are being indexed but not showed in the results, I'd recommend to upgrade the software of the Google Mini. It happened to me some time ago, and the upgrade resolved the issue.
From HD -
From the homepage of the administration of your Google Mini:
- Click on "Status and Reports" in the left hand column.
- From the drop-down, click on "Crawl Diagnostics".
From here you'll have 4 columns, Host Name, Crawled URLs, Retrieval Errors, and Excluded URLs. The values in these columns are hyperlinks to additional information regarding each column. If there are errors with these particular docs, you'll find your answer here.
From GregD
0 comments:
Post a Comment