Interesting positioning. Cross-tool memory portability is the right direction. A practical trust layer to add: every recalled item should carry provenance + freshness metadata so agents can choose whether to trust, refresh, or ignore memory instead of treating all recalls as equally valid.
persistent memory across tools is the right problem to solve. the real friction isnt context length, its context continuity -- picking up a claude session tomorrow feeling like you never left. memobase looks solid for the memory layer. the missing piece most people ignore is session state itself: terminal output, working directory, what commands actually ran. memory without replay is just notes.
Yeah, 100% agree. That's one thing I just thought about yesterday also - Every session should be summarised and written to the data store also, that way sessions become portable contexts. There's potentially a case to be made to have full replay, i.e. the plaintext sessions stored but I'm not entirely sure on how much more valuable that is over a summary.
summaries are probably 80% of the value at 10% of the storage cost. full replay is nice for debugging weird agent behavior but day-to-day you rarely need the raw transcript.
the interesting edge case is when the summary itself becomes the lossy artifact - like, who decides whats important enough to keep? if the model summarises, it might quietly drop the context that would have mattered most next week.
hybrid might be the move: rolling summary plus last N raw turns.
Right - that makes sense. I see it similarly. My assumption though is that as models get better, the likelyhood of the model missing context that matters the most will get lower and lower.
The hybrid approach though is nice, I'll have a think about that and see if that's something I can incoporate into it. Thanks for the feedback, very much appreciated
yeah models are definitely getting better at prioritizing signal over noise. but theres still an interesting edge case -- the context that matters most is often the stuff you didnt know mattered when you said it. like a throwaway comment three sessions ago that turns out to be load-bearing. hybrid at least gives you a fallback when the model makes a bad bet on whats important.
nice, ship it fast and break things. curious what your persistence layer looks like -- are you serializing full context or just surfacing semantic anchors? the anchor approach is way more interesting imo, lets you reconstruct intent without dragging 50k tokens of cruft into every new session. thats the real trick nobody talks about
I'm not gonna lie - at the moment it's pretty basic as a chunked semantic store where relevant chunks are retrieved in conjunction with some criteria the Agent can pass to the MCP server.
Context usage is definitely a problem that's on my mind also and semantic anchors are one area I'm exploring but don't have a clear architecture for it jotted down yet. The real problem I'm facing right now is how to inject this into say claude or chatgpt and have those agents default use it as a memory layer
for the injection problem - system prompt is your friend. just hardcode "before responding, query your memory MCP tool with relevant keywords" into the system prompt. crude but it works. claude especially follows tool-use instructions pretty reliably when theyre explicit. chatgpt less so honestly. the real trick is making the retrieval call feel like the agents idea, not an obligation - framing it as "you have a memory layer, use it" vs "you must call this tool" makes a suprising difference
That is what I do at the moment - I gotta update the instructions on the website to prod users to set things up this way.
The issue I perceive though is that adding an MCP server alone is not enough to modify the system prompt of the AI Agent. I tried to have the mcp server description be an injected prompt to add these instructions to the system prompt but that doesn't seem to work, I tried adding sampling to the MCP server which supposedly should be able to plug into messages without luck, tried to optimise for chatGPT with an OpenAPI spec etc.
The only way I found that I can get those clients to use my memory layer is by doing what you describe - which is not necessarily the most user friendly/one-click setup I desire
yeah the system prompt injection problem is a real pain. the mcp spec wasnt really designed with "persistent context shaping" in mind, its more request/response than ambient influence.
one hacky workaround ive seen work: expose a tool thats always called first via a resource that hints at it, basically tricking the agent into self-priming. ugly but functional.
honestly though for true one-click youre probably stuck until clients expose better hooks. the spec needs to catch up to the use case.
this is super cool. can i get it to work as effortlessly as a chrome plugin? I use a lot of different models and a lot of specific/vertical ai on different products that I'd love to not have to constantly give context to to be useful.
Short answer is yes - just add the MCP server and you're golden. Longer answer is that most chat clients do not allow the MCP server to automatically inject it's own system prompt which means you have to specifically prompt your AI to write to memobase.
Interesting positioning. Cross-tool memory portability is the right direction. A practical trust layer to add: every recalled item should carry provenance + freshness metadata so agents can choose whether to trust, refresh, or ignore memory instead of treating all recalls as equally valid.
Thanks - glad it resonates. Great idea on the freshness that makes total sense. I'll add that for sure
//Edit: I've implemented this and it's live now
persistent memory across tools is the right problem to solve. the real friction isnt context length, its context continuity -- picking up a claude session tomorrow feeling like you never left. memobase looks solid for the memory layer. the missing piece most people ignore is session state itself: terminal output, working directory, what commands actually ran. memory without replay is just notes.
Yeah, 100% agree. That's one thing I just thought about yesterday also - Every session should be summarised and written to the data store also, that way sessions become portable contexts. There's potentially a case to be made to have full replay, i.e. the plaintext sessions stored but I'm not entirely sure on how much more valuable that is over a summary.
Do you have thoughts, or a take on that?
summaries are probably 80% of the value at 10% of the storage cost. full replay is nice for debugging weird agent behavior but day-to-day you rarely need the raw transcript.
the interesting edge case is when the summary itself becomes the lossy artifact - like, who decides whats important enough to keep? if the model summarises, it might quietly drop the context that would have mattered most next week.
hybrid might be the move: rolling summary plus last N raw turns.
Right - that makes sense. I see it similarly. My assumption though is that as models get better, the likelyhood of the model missing context that matters the most will get lower and lower.
The hybrid approach though is nice, I'll have a think about that and see if that's something I can incoporate into it. Thanks for the feedback, very much appreciated
yeah models are definitely getting better at prioritizing signal over noise. but theres still an interesting edge case -- the context that matters most is often the stuff you didnt know mattered when you said it. like a throwaway comment three sessions ago that turns out to be load-bearing. hybrid at least gives you a fallback when the model makes a bad bet on whats important.
100% - I'll implement and add that now :)
nice, ship it fast and break things. curious what your persistence layer looks like -- are you serializing full context or just surfacing semantic anchors? the anchor approach is way more interesting imo, lets you reconstruct intent without dragging 50k tokens of cruft into every new session. thats the real trick nobody talks about
I'm not gonna lie - at the moment it's pretty basic as a chunked semantic store where relevant chunks are retrieved in conjunction with some criteria the Agent can pass to the MCP server.
Context usage is definitely a problem that's on my mind also and semantic anchors are one area I'm exploring but don't have a clear architecture for it jotted down yet. The real problem I'm facing right now is how to inject this into say claude or chatgpt and have those agents default use it as a memory layer
for the injection problem - system prompt is your friend. just hardcode "before responding, query your memory MCP tool with relevant keywords" into the system prompt. crude but it works. claude especially follows tool-use instructions pretty reliably when theyre explicit. chatgpt less so honestly. the real trick is making the retrieval call feel like the agents idea, not an obligation - framing it as "you have a memory layer, use it" vs "you must call this tool" makes a suprising difference
That is what I do at the moment - I gotta update the instructions on the website to prod users to set things up this way.
The issue I perceive though is that adding an MCP server alone is not enough to modify the system prompt of the AI Agent. I tried to have the mcp server description be an injected prompt to add these instructions to the system prompt but that doesn't seem to work, I tried adding sampling to the MCP server which supposedly should be able to plug into messages without luck, tried to optimise for chatGPT with an OpenAPI spec etc.
The only way I found that I can get those clients to use my memory layer is by doing what you describe - which is not necessarily the most user friendly/one-click setup I desire
yeah the system prompt injection problem is a real pain. the mcp spec wasnt really designed with "persistent context shaping" in mind, its more request/response than ambient influence.
one hacky workaround ive seen work: expose a tool thats always called first via a resource that hints at it, basically tricking the agent into self-priming. ugly but functional.
honestly though for true one-click youre probably stuck until clients expose better hooks. the spec needs to catch up to the use case.
this is super cool. can i get it to work as effortlessly as a chrome plugin? I use a lot of different models and a lot of specific/vertical ai on different products that I'd love to not have to constantly give context to to be useful.
love where this is headed!
Short answer is yes - just add the MCP server and you're golden. Longer answer is that most chat clients do not allow the MCP server to automatically inject it's own system prompt which means you have to specifically prompt your AI to write to memobase.