PISTONHEADS SEARCH ENGINE

PISTONHEADS SEARCH ENGINE

Author
Discussion

RacingPete

8,883 posts

204 months

Monday 6th July 2015
quotequote all
I will jump on this thread in a bit and explain the whole project, why we did it, the logic and technology used behind it etc...

But I will add a few things for now:
  • We do release things in incremental releases - so what we have released now is not the end of development on this feature. We can tweak and add better functionality on to it (such as better pre-search criteria to stop you having to navigate the pyramid upside down).
  • Google's business is search - it would be hard to compete with a multi-billion pound organisation who (primary) focus is on getting a great search engine... though we do know how to get near to it for Forum Search, but we need to build much more infrastructure to capture how users are using the search and what results they are clicking on to drive back relevance. The problem is justifying the cost of doing that detailed work, we all have to satisfy a bottom line at the end of the day.

greymrj

Original Poster:

3,316 posts

204 months

Monday 6th July 2015
quotequote all
Ok LordGrover, tried again. Put in 'TVR S brake servo', as written.

Search revealed lots of other marques. Went to TVR (which I shouldnt have had to do as it was in the search title) and opened. Went to S and hovered and got the option 'only', clicked on this and it gave me TVR S plus all the other marques. It did NOT restrict selection to S only. I didnt go further.

I am using windows 7 on a good modern laptop and browzing using Mozilla (up to date)

greymrj

Original Poster:

3,316 posts

204 months

Monday 6th July 2015
quotequote all
OK Pete, I think it is over to you. You have had some pretty hard feedback on the system so far. Onlyt fair to give you the opportunity to digest what has come back and consider the implications.
I have got to say it doesnt look like 'tweeking' to me. There seem to be some pretty fundamental issues to look at.

Maybe you should share the objectives of the project with us to see how relevant they are first?

My use is very regular but it is almost entirely related to gaining the latest technical information on repair and restoration, adding to that information from research I have done, and directing less experienced members to the best PH source of the advice they have sought. All that is almost entirely related to one model of one marque.

LordGrover

33,545 posts

212 months

Monday 6th July 2015
quotequote all
greymrj said:
Ok LordGrover, tried again. Put in 'TVR S brake servo', as written.

Search revealed lots of other marques. Went to TVR (which I shouldnt have had to do as it was in the search title) and opened. Went to S and hovered and got the option 'only', clicked on this and it gave me TVR S plus all the other marques. It did NOT restrict selection to S only. I didnt go further.

I am using windows 7 on a good modern laptop and browzing using Mozilla (up to date)
1. There is no need to include TVR S in the search box - it may even skew the results.
2. If you selected Only next to S Series then something's wrong. On my computer it clears ALL other checkboxes. Perhaps that's where your issue lays?

RacingPete

8,883 posts

204 months

Monday 6th July 2015
quotequote all
greymrj said:
OK Pete, I think it is over to you. You have had some pretty hard feedback on the system so far. Onlyt fair to give you the opportunity to digest what has come back and consider the implications.
I have got to say it doesnt look like 'tweeking' to me. There seem to be some pretty fundamental issues to look at.

Maybe you should share the objectives of the project with us to see how relevant they are first?

My use is very regular but it is almost entirely related to gaining the latest technical information on repair and restoration, adding to that information from research I have done, and directing less experienced members to the best PH source of the advice they have sought. All that is almost entirely related to one model of one marque.
Might be tomorrow morning the way today is going... but will do, and thank you for your use case - as all helpful.

greymrj

Original Poster:

3,316 posts

204 months

Monday 6th July 2015
quotequote all
LordGrover said:
1. There is no need to include TVR S in the search box - it may even skew the results.
?
I am going to leave this to the webmaster for a bit. There is now plenty of evidence that the search function doesnt meet the users needs as it stands. However, before I do so can I ask you to look again at the above statement you made. With respect, I suggest you think hard about that statement and what it means in the context of the objectives of a search function! I will leave that with you.

LordGrover

33,545 posts

212 months

Tuesday 7th July 2015
quotequote all
rofl I'm out.

RacingPete

8,883 posts

204 months

Tuesday 7th July 2015
quotequote all
Still flat out - not forgotten about this.

RacingPete

8,883 posts

204 months

Wednesday 8th July 2015
quotequote all
Why we changed Forum Search?

We have over 30 million posts, and around 10,000 new posts every day, across 200 plus forums.
Google may be easy to setup, and use their algorithms for searching - but it has limitations:
  • It doesn't index or search any hidden forums
  • You cannot limit to posts by a user
  • You cannot reliably search between dates
  • You cannot search specific forums
  • We are at the mercy of Google crawling our site for updates
  • We cannot customise it - having to take Google design for it
  • We cannot differentiate tracking from it to normal traffic coming from Google
So we had some requirements for the new search system:
  • Search all the posts across the site
  • Or posts by a user
  • Or by a date
  • Even in hidden forums (useful for moderators)
  • Have real-time updates (within 1 minute of a post being added)
  • Be responsive across multiple devices
  • Cope with growth
  • Super fast response times
How it works?

We use a technology called Elasticsearch to run the hardware underneath. This is a dedicated search technology based on Lucene and is becoming more and more popular as a technology. With Elasticsearch it has some features which means that it can do relevancy better than we could programme in:
  • Stemmers (so being able to take words such as "speeding" and search for "speed", "speeds" etc)
  • Filter shingles (this allows it to look at proximity of words next to each other for relevance, so the mention of BMW and M3 in the same post is not as relevant as BMW M3 when next to each other)
  • Stop words (removal of common english language words in the search terms, e.g. and or in if it etc.)
  • Synonyms (allows us to specify forum colloquisim into same words, e.g. porker, porsche etc)
  • Phrase suggester (provides an alternative spelling if the user has entered something wrongly when searching)
Now we do have some control over what the system will search, as each part of the forums is indexed, and we then apply a weighting to those fields. So the fields we are indexing are:
  • Forum title
  • Forum posts
  • Date
  • Forum poster name
  • Forum
We weight the title and posts equal at the moment (this is where we can do some tuning), but if multiple posts under the same forum title have hits in the search index then that will filter slightly higher to the top as regards to relevance.

Not all users put the specific thing they are talking about in the title, and there are quite a few threads which have the gem of the information in the post that is not in the title, especially those that say "Looking to buy Honda S2000" etc.

Date is not weighted very highly, as we feel that the content is the key to a result, but happy to look into this to see if recency should be weighted higher.

There are going to be multiple use cases for searching the forums, and the feedback we have had has been massively positive. But we will take your cases and look at how we can improve the result set for them.

greymrj

Original Poster:

3,316 posts

204 months

Wednesday 8th July 2015
quotequote all
I hear you. Lets give you time? Immediate thoughts:
Personally I almost always find date to be of very high importance so I would certainly want to see it have a higher weighting. Certainly I would have expected the last column to be in reverse order. i.e. latest as the default position, with the ability to search earlier if required.
One big potential advantage over google is the potential ability to locate the last post on a subject whereas Google find the date by the start of the last thread on the subject.
Is it possible to give some differecne in weighting to threads which have the search subject in the title, over those which have it in the content. i.e to prioritise threads which are ABOUT the subject well over those which merely mention the subject. That differentiation did not seem to happen and most of the 'most relevant' items the search found were actually of low relevance to that search subject.
I was very worried by LordGrovers comment that by being more specific in the search subject it could skew the results, I have to say that amazed me. If I put in TVR S as part of the subject then I expect the search to be limited to posts relevant to TVR S!
I would certainly expect the first selection in the current format to 'tick' only what I asked for, if that was in my search subject. To have to uncheck the bits I do not want seems a poor approach. If you want to search across the whole of PH for someone mentioning 'servo' then by all means expand your search, but I would have thought the rest of us wanted the tip of the pyramid, or at least to start from there.

RacingPete

8,883 posts

204 months

Wednesday 8th July 2015
quotequote all
In the specific TVR S search, there is a very generic word "S" in it, which means it is hard to know that is a specific model... though we did look at telling the system that, and something we may revisit. If the search was for TVR Tuscan, you will probably see better results as that word is more unique.

The other thing Google does is to monitor the click through, dwell times after clicking, number of pages viewed etc to then drive back into the search results. This allows it to tune the results on user behaviour to get out better sentiment of searching. For example "TVR S steering rack" could have several reasons for that search. Maybe someone wants to buy one, or fix one, or just general info about one.. or something else.

By looking at the results clicked on and driving that back in, then that helps move the better sentiment higher in the relevancy of results. This is something we are looking at, but won't be till the new year.

greymrj

Original Poster:

3,316 posts

204 months

Wednesday 8th July 2015
quotequote all
I take your point about 'S' being potentially generic, although it does still appear as a separate model in your first field. We S guys wouldn't like to be missed!
At the end of the day the proof will be in the pudding. As it stands it is of little value to me for the purposes in which I search on PH. I appreciate that this doesnt mean it is of little value to others. You may have noted another thread on the TVR S forum on which I asked other members to test the search facility. Several prominent members did, and the consensus rather supports my view.

Include S as a model!
I am not sure what LordGrover was on about but allow the searcher to be more rather than less specific in their search subject.
Make it so you select anything you want other than the subject model, rather than deselect.
Think hard about the weight given to 'date' and see what proportion of searches are answered best by latest information. I still remain to be convinced about defining 'relevance' as it is a qualitative rather than quantitive matter.
Give threads ABOUT a subject more 'relevance' than posts which merely mention the subject wording.
Initially filter to give priority to most up to date posts, with option to reverse this.

How long do I leave it before testing again!

227bhp

10,203 posts

128 months

Thursday 9th July 2015
quotequote all
I must be missing something here as i'm having difficulty fathoming out how 'Does not work' can be discussed for three pages!

I put in a single word or phrase
Search engine says "We found 0 results for your search "xxxx"

Even following the earlier hyperlinks gives the same results.

rscott

14,761 posts

191 months

Thursday 9th July 2015
quotequote all
227bhp said:
I must be missing something here as i'm having difficulty fathoming out how 'Does not work' can be discussed for three pages!

I put in a single word or phrase
Search engine says "We found 0 results for your search "xxxx"

Even following the earlier hyperlinks gives the same results.
What phrase did you try?

rscott

14,761 posts

191 months

Thursday 9th July 2015
quotequote all
greymrj said:
I take your point about 'S' being potentially generic, although it does still appear as a separate model in your first field. We S guys wouldn't like to be missed!
At the end of the day the proof will be in the pudding. As it stands it is of little value to me for the purposes in which I search on PH. I appreciate that this doesnt mean it is of little value to others. You may have noted another thread on the TVR S forum on which I asked other members to test the search facility. Several prominent members did, and the consensus rather supports my view.

Include S as a model!
I am not sure what LordGrover was on about but allow the searcher to be more rather than less specific in their search subject.
Make it so you select anything you want other than the subject model, rather than deselect.
Think hard about the weight given to 'date' and see what proportion of searches are answered best by latest information. I still remain to be convinced about defining 'relevance' as it is a qualitative rather than quantitive matter.
Give threads ABOUT a subject more 'relevance' than posts which merely mention the subject wording.
Initially filter to give priority to most up to date posts, with option to reverse this.

How long do I leave it before testing again!
It sounds like you'd prefer that it only searches on post subject by default.
That might be appropriate for your usage, but quite probably not for many other users. For example, a lot of the posts in GG are 'What Car ' type posts which, more often than not, don't include any vehicle details in the title.
Similarly, I've found that searching across all forums by default returns more useful results for me and with fewer clicks.

I'd also prefer it If it defaulted to descending order when date is selected.

Perhaps it needs user configurable defaults for 'search subject only' and 'default sort' ?

227bhp

10,203 posts

128 months

Thursday 9th July 2015
quotequote all
rscott said:
227bhp said:
I must be missing something here as i'm having difficulty fathoming out how 'Does not work' can be discussed for three pages!

I put in a single word or phrase
Search engine says "We found 0 results for your search "xxxx"

Even following the earlier hyperlinks gives the same results.
What phrase did you try?
Nothing works whatsoever, everything (be it phrase or word) gives the same result:

"We found 0 results for your search "What phrase did you try""

LordGrover

33,545 posts

212 months

Thursday 9th July 2015
quotequote all
How queer... works for me.


rscott

14,761 posts

191 months

Thursday 9th July 2015
quotequote all
227bhp said:
rscott said:
227bhp said:
I must be missing something here as i'm having difficulty fathoming out how 'Does not work' can be discussed for three pages!

I put in a single word or phrase
Search engine says "We found 0 results for your search "xxxx"

Even following the earlier hyperlinks gives the same results.
What phrase did you try?
Nothing works whatsoever, everything (be it phrase or word) gives the same result:

"We found 0 results for your search "What phrase did you try""
Weird. What device/browser are you using - it's working fine for me on W7/Chrome.

227bhp

10,203 posts

128 months

Thursday 9th July 2015
quotequote all
W7, Chrome with Mcafee security. Both PC and lappy are the same.

RacingPete

8,883 posts

204 months

Monday 13th July 2015
quotequote all
227bhp said:
W7, Chrome with Mcafee security. Both PC and lappy are the same.
Have really struggled to emulate this - so I am thinking it might be linked to your login details so will try and emulate those settings and see what happens.